Hi Andrew and others,

> There are plenty of ways I now know to create CDK descriptors and build
> models but I'm looking for the best way to distribute them as a desktop
> application.
>

First of all, are you looking for an ODOSOS-spirited solution for
distributing models or a ready to use desktop application? If you're
more into the former (and don't mind some Java programming) then
perhaps I may suggest you to take a look at the QsarDB project
(http://qsardb.googlecode.com), which is a proposal for the electronic
organization and archiving of QSAR/QSPR model information.

Basically, QsarDB enables you to encapsulate a QSAR model (and all of
its supporting information) into a single so-called QDB file. QDB
files are easy to distribute and archive. When handled in a proper
run-time environment they readily lend themselves to programmatic
execution, such as making a prediction.

> My dream program would be one where you would input a SMILES (GUI), it would
> generate the CDK descriptors and then report back (user selected) predicted
> properties based upon Open models (linear, random forest, etc) with the
> ability to download and add new models (like you can add new functionality
> with R). A command line option that does batches would be important too.
>
> Does this program exist?
>

Recently I did some research about QSAR model data formats and
couldn't find anything major except Bioclipse's QSAR-ML data format
(http://pele.farmbio.uu.se/qsar-ml/). Unfortunately, QSAR-ML appears
to be limited to the representation of raw datasets (ie. chemical
structures, property and descriptor values) and doesn't cover the rest
of a typical QSAR modelling workflow.

QsarDB handles statistical models in the PMML data format. While
Rajarshi suggested to use the Weka toolkit for loading and storing
PMML models, our group decided to develop a new light-weight Java PMML
library called JPMML (http://jpmml.googlecode.com) for this purpose.
At the moment JPMML can do linear regression, decision tree and neural
network models.

Given the QsarDB, JPMML and CDK libraries, it should be pretty
straightforward to write a command-line application that does exactly
what you describe. The application would take the input SMILES and the
list of executable QDB files as its arguments. The calculation of CDK
descriptors can be performed locally or they can be fetched from a
remote REST service. As a bonus, it will be possible to quantify the
goodness of every prediction.

Please let me know if you're interested in exploring the possiblities
of QsarDB in more detail.


Best regards,
VR

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to