Hi Christoph,

Thank you for the input - very useful!  The overhead is certainly a concern, 
but I probably won’t have a handle on how much that will affect our workflow 
until I’ve prototyped it.  The explicit hydrogen problem might be troublesome 
though - I’m envisioning that the “start” structures I feed to RandomGenerator 
will have (or can have) their hydrogen atoms specified explicitly.  Are you 
saying that in such a case, the output is as expected?  Is the problem with 
molecules having implied hydrogen positions?

I first looked at using OMG (actually the parallel version here: 
https://sourceforge.net/projects/pmgcoordination/), but I need the program to 
be functional for formulae having halogens.  Unfortunately the PMG2 program 
doesn’t seem to handle elements with more than two letters in the symbol (e.g. 
Cl, Br).  I’m not a good enough Java programmer to fix that, and regardless: I 
agree that the only non-biased way to use the output from such a structure 
generator would be to predict the entire constitutional space first and then 
randomly select 100K from that set.  Not very efficient for what I’m trying to 
do, probably.

For now I’ll try the rJava approach and see how far I get.  Thanks again!

Lee

--
P. Lee Ferguson, Ph.D.
Associate Professor
Department of Civil & Environmental Engineering
Pratt School of Engineering &
Nicholas School of the Environment
Duke University
121 Hudson Hall, Box 90287
Durham, NC 27708-0287
http://ferguson.cee.duke.edu

Phone: 919-660-5460
Fax: 919-660-5454




On Mar 28, 2018, at 11:37 PM, Christoph Steinbeck 
<[email protected]<mailto:[email protected]>> wrote:

Hi Lee,

I cannot help you with the R issue but wanted to add that I recently ran a few 
tests to check if RandomGenerator coverers all of constitutional space, which 
it did in my few test cases. The overhead for a small set of atoms (like 
C10H16) was 200-fold, i. e. RG needs to visit 200 times the actual size of 
constitutional space to visit all constitutional isomers of that space.
There is the added complexity that it seems to have problems (I don’t know why) 
with explicit hydrogens. I only use it in cases where the hydrogen distribution 
is known.
If that is the case, my gut feeling is that the RandomGenerator is a good 
choice for your problem, rather than a deterministic generator like OMG, which 
might have a sampling bias if you only take the first 100k. Furthermore, OMG 
will take very long if you move to slightly larger molecules.

Will be interesting to learn about your progress here :)

Kind regards,

Chris


—
Prof. Dr. Christoph Steinbeck
Analytical Chemistry - Cheminformatics and Chemometrics
Friedrich-Schiller-University Jena, Germany
Phone Secretariat: +49-3641-948171
https://urldefense.proofpoint.com/v2/url?u=http-3A__cheminf.uni-2Djena.de&d=DwIFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=6dHZKaYBl5S1-Z9QK2RU0G4MUypxoKEYl9F7rt78dx4&m=3skjNyB2SqcqaONZ1_Ik8oPl-zzFJpAIzQrbE4o6-2k&s=b5ys-12u-RFo2ji1_xyKa1kEWWQ_XHL5z-ktWAp3CeI&e=
https://urldefense.proofpoint.com/v2/url?u=http-3A__orcid.org_0000-2D0001-2D6966-2D0814&d=DwIFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=6dHZKaYBl5S1-Z9QK2RU0G4MUypxoKEYl9F7rt78dx4&m=3skjNyB2SqcqaONZ1_Ik8oPl-zzFJpAIzQrbE4o6-2k&s=GvgoJLAUqZQbsbpoTkvvNhU8mqTHPGEY8S_R4RK5C5c&e=

What is man but that lofty spirit - that sense of enterprise.
... Kirk, "I, Mudd," stardate 4513.3..

On 29 Mar 02018, at 05:03, Lee Ferguson 
<[email protected]<mailto:[email protected]>> wrote:

Hello all,

I’m trying to work out a good way to perform structure generation from 
molecular formula for integration into an analytical pipeline in our lab.  
Specifically, what I want to do is generate (up to) ~100,000 structures for any 
given molecular formula and subsequently down-select from that set using some 
structural similarity filtering.  It looks like I could potentially bend the 
RandomGenerator function within the structgen package to my structure 
generation needs, but I was hoping to do this within R, and from what Zach 
tells me, there is no wrapper for the structgen package in rCDK as of yet.   
Zack suggested I could use rJava, which is a great idea, but he also pointed 
out that I might consider posting here in case someone else had tried to do 
something similar already and thus I might avoid reinventing the wheel.

I’d appreciate any ideas or guidance you all might have.

Best regards,
Lee

--
P. Lee Ferguson, Ph.D.
Associate Professor
Department of Civil & Environmental Engineering
Pratt School of Engineering &
Nicholas School of the Environment
Duke University
121 Hudson Hall, Box 90287
Durham, NC 27708-0287
http://ferguson.cee.duke.edu

Phone: 919-660-5460
Fax: 919-660-5454




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>! 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F&d=DwIFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=6dHZKaYBl5S1-Z9QK2RU0G4MUypxoKEYl9F7rt78dx4&m=3skjNyB2SqcqaONZ1_Ik8oPl-zzFJpAIzQrbE4o6-2k&s=DuxfvYDeLPVsmx8_Zb1pSm41IHNAH5t3Roj-Poifki0&e=
Cdk-user mailing list
[email protected]<mailto:[email protected]>
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_cdk-2Duser&d=DwIFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=6dHZKaYBl5S1-Z9QK2RU0G4MUypxoKEYl9F7rt78dx4&m=3skjNyB2SqcqaONZ1_Ik8oPl-zzFJpAIzQrbE4o6-2k&s=xzfF9lnc3WGfJl5Fy4vlIAqojilI6nluxTsWG-SI0RE&e=

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to