The problem has two steps: 1. Serialization of the model 2. Deserialization and application of the model to new data
Step 2 involves descriptor calculation for new molecules - you already have the CDK for that. So it really boils down to serialization and application. Assuming you don't want to work over the internet (which would let you store models remotely, say via an R server, obtain predictions via a service), you could consider PMML serialization. Since this effectively stores the whole model (for a set of model types - CNN, RF, etc), you'd have to write Java code to read in PMML and basically run the model via your own code. The other option is to say develop models via Weka and bundle Weka into your distribution. (This is all assuming you're working on a Java application) On Fri, Dec 2, 2011 at 9:15 AM, Andrew Lang <[email protected]> wrote: > > There are plenty of ways I now know to create CDK descriptors and build > models but I'm looking for the best way to distribute them as a desktop > application. > > My dream program would be one where you would input a SMILES (GUI), it would > generate the CDK descriptors and then report back (user selected) predicted > properties based upon Open models (linear, random forest, etc) with the > ability to download and add new models (like you can add new functionality > with R). A command line option that does batches would be important too. > > Does this program exist? > > Thanks, > > Andy > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Blueobelisk-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss > -- Rajarshi Guha NIH Chemical Genomics Center ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Blueobelisk-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
