The problem has two steps:

1. Serialization of the model
2. Deserialization and application of the model to new data

Step 2 involves descriptor calculation for new molecules - you already
have the CDK for that.

So it really boils down to serialization and application. Assuming you
don't want to work over the internet (which would let you store models
remotely, say via an R server, obtain predictions via a service), you
could consider PMML serialization. Since this effectively stores the
whole model (for a set of model types - CNN, RF, etc), you'd have to
write Java code to read in PMML and basically run the model via your
own code.

The other option is to say develop models via Weka and bundle Weka
into your distribution.

(This is all assuming you're working on a Java application)

On Fri, Dec 2, 2011 at 9:15 AM, Andrew Lang <[email protected]> wrote:
>
> There are plenty of ways I now know to create CDK descriptors and build
> models but I'm looking for the best way to distribute them as a desktop
> application.
>
> My dream program would be one where you would input a SMILES (GUI), it would
> generate the CDK descriptors and then report back (user selected) predicted
> properties based upon Open models (linear, random forest, etc) with the
> ability to download and add new models (like you can add new functionality
> with R). A command line option that does batches would be important too.
>
> Does this program exist?
>
> Thanks,
>
> Andy
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Blueobelisk-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
>



-- 
Rajarshi Guha
NIH Chemical Genomics Center

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to