Hi Andrew and others, > There are plenty of ways I now know to create CDK descriptors and build > models but I'm looking for the best way to distribute them as a desktop > application. >
First of all, are you looking for an ODOSOS-spirited solution for distributing models or a ready to use desktop application? If you're more into the former (and don't mind some Java programming) then perhaps I may suggest you to take a look at the QsarDB project (http://qsardb.googlecode.com), which is a proposal for the electronic organization and archiving of QSAR/QSPR model information. Basically, QsarDB enables you to encapsulate a QSAR model (and all of its supporting information) into a single so-called QDB file. QDB files are easy to distribute and archive. When handled in a proper run-time environment they readily lend themselves to programmatic execution, such as making a prediction. > My dream program would be one where you would input a SMILES (GUI), it would > generate the CDK descriptors and then report back (user selected) predicted > properties based upon Open models (linear, random forest, etc) with the > ability to download and add new models (like you can add new functionality > with R). A command line option that does batches would be important too. > > Does this program exist? > Recently I did some research about QSAR model data formats and couldn't find anything major except Bioclipse's QSAR-ML data format (http://pele.farmbio.uu.se/qsar-ml/). Unfortunately, QSAR-ML appears to be limited to the representation of raw datasets (ie. chemical structures, property and descriptor values) and doesn't cover the rest of a typical QSAR modelling workflow. QsarDB handles statistical models in the PMML data format. While Rajarshi suggested to use the Weka toolkit for loading and storing PMML models, our group decided to develop a new light-weight Java PMML library called JPMML (http://jpmml.googlecode.com) for this purpose. At the moment JPMML can do linear regression, decision tree and neural network models. Given the QsarDB, JPMML and CDK libraries, it should be pretty straightforward to write a command-line application that does exactly what you describe. The application would take the input SMILES and the list of executable QDB files as its arguments. The calculation of CDK descriptors can be performed locally or they can be fetched from a remote REST service. As a bonus, it will be possible to quantify the goodness of every prediction. Please let me know if you're interested in exploring the possiblities of QsarDB in more detail. Best regards, VR ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Blueobelisk-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
