[Rdkit-discuss] molecule not draw well
Just a FYI The following molecule: Cc1ccc(C[NH+]2C32CC(NC(=S)Nc2c2C)C3)cc1 looks broken when drawn with 2014.09.1 (attached). Thanks, - Jean-Paul Ebejer Early Stage Researcher -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Modified Mol objects with concurrent.futures
Hi Michael, The problem occurs because child processes return their results using pickle, and the ordinary rdkit molecule object when is being pickled it looses information. A solution that I use is to convert the molecule objects to PropertyMol objects, which retain their properties. Best, Christos Christos Kannas Researcher Ph.D Student Mob (UK): +44 (0) 7447700937 Mob (Cyprus): +357 99530608 [image: View Christos Kannas's profile on LinkedIn] http://cy.linkedin.com/in/christoskannas On 2 February 2015 at 09:03, Reutlinger, Michael michael.reutlin...@roche.com wrote: Hi all, I am currently trying to parallelize part of a script using RDKIT and concurrent.futures. The function that is executed in parallel returns processed molecules as RDKIT Mol objects. Without parallelization everything is fine and the Mol objects keep all the properties that they had before the processing. When using concurrent.futures, the returned molecules lose all properties and seem to be created from scratch maybe with unknown side-effects. I am wondering if anyone experienced the same issue and knows how to circumvent this. I attached a ipython notebook with a small script demonstrating the issue. Best, Michael Example Code: from concurrent import futures from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem.Draw import IPythonConsole def process(mol): if not Name in mol.GetPropNames(): print Processing: Name missing mol.SetProp(Processed,True) return mol mol = Chem.MolFromSmiles(N[C@@H](C)C(=O)O) mol.SetProp(Name,Alanine) with futures.ProcessPoolExecutor(max_workers=1) as pool: future = pool.submit(process, mol) molOut = future.result() if Name not in molOut.GetPropNames(): print Result: Name missing if Processed not in molOut.GetPropNames(): print Result: Processed missing -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Modified Mol objects with concurrent.futures
Hi all, I am currently trying to parallelize part of a script using RDKIT and concurrent.futures. The function that is executed in parallel returns processed molecules as RDKIT Mol objects. Without parallelization everything is fine and the Mol objects keep all the properties that they had before the processing. When using concurrent.futures, the returned molecules lose all properties and seem to be created from scratch maybe with unknown side-effects. I am wondering if anyone experienced the same issue and knows how to circumvent this. I attached a ipython notebook with a small script demonstrating the issue. Best, Michael Example Code: from concurrent import futures from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem.Draw import IPythonConsole def process(mol): if not Name in mol.GetPropNames(): print Processing: Name missing mol.SetProp(Processed,True) return mol mol = Chem.MolFromSmiles(N[C@@H](C)C(=O)O) mol.SetProp(Name,Alanine) with futures.ProcessPoolExecutor(max_workers=1) as pool: future = pool.submit(process, mol) molOut = future.result() if Name not in molOut.GetPropNames(): print Result: Name missing if Processed not in molOut.GetPropNames(): print Result: Processed missing RDKIT_ParallelProblem (1).ipynb Description: Binary data -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Modified Mol objects with concurrent.futures
Hi Christos, thanks for pointing out the pickle issue and the solution using PropertyMol. After reading the documentation this should definitely solve the problem. Best, Michael Michael Reutlinger, PhD Scientist, Molecular Design and Chemical Biology Roche Pharma Research and Early Development Roche Innovation Center Basel F. Hoffmann-La Roche Ltd Grenzacherstrasse 124 4070 Basel Switzerland Phone +41 61 688 87 95 Fax +41 61 688 64 59 *Confidentiality Note:* This message is intended only for the use of the named recipient(s) and may contain confidential and/or proprietary information. If you are not the intended recipient, please contact the sender and delete this message. Any unauthorized use of the information contained in this message is prohibited. On Mon, Feb 2, 2015 at 11:17 AM, Christos Kannas chriskan...@gmail.com wrote: Hi Michael, The problem occurs because child processes return their results using pickle, and the ordinary rdkit molecule object when is being pickled it looses information. A solution that I use is to convert the molecule objects to PropertyMol objects, which retain their properties. Best, Christos Christos Kannas Researcher Ph.D Student Mob (UK): +44 (0) 7447700937 Mob (Cyprus): +357 99530608 [image: View Christos Kannas's profile on LinkedIn] http://cy.linkedin.com/in/christoskannas On 2 February 2015 at 09:03, Reutlinger, Michael michael.reutlin...@roche.com wrote: Hi all, I am currently trying to parallelize part of a script using RDKIT and concurrent.futures. The function that is executed in parallel returns processed molecules as RDKIT Mol objects. Without parallelization everything is fine and the Mol objects keep all the properties that they had before the processing. When using concurrent.futures, the returned molecules lose all properties and seem to be created from scratch maybe with unknown side-effects. I am wondering if anyone experienced the same issue and knows how to circumvent this. I attached a ipython notebook with a small script demonstrating the issue. Best, Michael Example Code: from concurrent import futures from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem.Draw import IPythonConsole def process(mol): if not Name in mol.GetPropNames(): print Processing: Name missing mol.SetProp(Processed,True) return mol mol = Chem.MolFromSmiles(N[C@@H](C)C(=O)O) mol.SetProp(Name,Alanine) with futures.ProcessPoolExecutor(max_workers=1) as pool: future = pool.submit(process, mol) molOut = future.result() if Name not in molOut.GetPropNames(): print Result: Name missing if Processed not in molOut.GetPropNames(): print Result: Processed missing -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Replacing H's with F's
Thanks Peter and Greg! I had a three atom query to restrict were I was putting F's, otherwise I would have done as Peter had suggested. Granted my path to flush out the duplicates by pushing this out into Java (using the RDKit Swig bindings) was way more involved than this! Thanks for the walkthrough Greg! It was very helpful! Thanks again! Matthew On Sat, Jan 31, 2015 at 1:58 AM, Greg Landrum greg.land...@gmail.com wrote: For anyone interested in this topic, I just did an RDKit blog post that has a somewhat expanded version of this answer: http://rdkit.blogspot.com/2015/01/chemical-reaction-notes-i.html Best, -greg On Sat, Jan 31, 2015 at 7:59 AM, Greg Landrum greg.land...@gmail.com wrote: Hi Matthew, On Fri, Jan 30, 2015 at 11:06 PM, Matthew Lardy mla...@gmail.com wrote: I am having an issue using the Smarts based Reaction transformations in RDKit. This is a weird transformation, but I wanted to replace any or all of the protons on an aromatic ring with an F. The original transformation that I tried was: c(F)c But that didn't work. So then I tried a couple of other transformations: [c:1][c:2][c:3][c:1][c:2]([F])[c:3] That failed (as these things generally were failing): ps = rxn.RunReactants(mol1) Traceback (most recent call last): File stdin, line 1, in module Boost.Python.ArgumentError: Python argument types in ChemicalReaction.RunReactants(ChemicalReaction, Mol) did not match C++ signature: RunReactants(class RDKit::ChemicalReaction *, class boost::python::list) RunReactants(class RDKit::ChemicalReaction *, class boost::python::tuple) The hint to what is going on is in the error message: you called the RunReactants method with a Mol (the ChemicalReaction in the argument list is the self argument) and it was expecting either a list or a tuple. Here's a version that works: In [8]: rxn = AllChem.ReactionFromSmarts('[c:1][c:2][c:3][c:1][c:2]([F])[c:3]') In [9]: m = Chem.MolFromSmiles('c1c1') In [10]: ps = rxn.RunReactants((m,)) In [11]: len(ps) Out[11]: 12 In [12]: Chem.MolToSmiles(ps[0][0]) Out[12]: 'Fc1c1' Note that this still doesn't really do what you want, because it's encoded to add an F to an aromatic carbon. Here's an example that shows that: In [15]: m = Chem.MolFromSmiles('c1ccc(C)cc1') In [16]: ps = rxn.RunReactants((m,)) In [17]: len(ps) Out[17]: 12 In [18]: set([Chem.MolToSmiles(x[0],True) for x in ps]) Out[18]: {'Cc1(F)c1', 'Cc1ccc(F)cc1', 'Cc1(F)c1', 'Cc1c1F'} Note the first product: the F was also added to the carbon with the methyl group. We can fix that by specifying that the reacting carbon must have an H attached: In [22]: rxn = AllChem.ReactionFromSmarts('[c:1][cH:2][c:3][c:1][c:2]([F])[c:3]') In [23]: ps = rxn.RunReactants((m,)) In [24]: len(ps) Out[24]: 10 In [25]: set([Chem.MolToSmiles(x[0],True) for x in ps]) Out[25]: {'Cc1ccc(F)cc1', 'Cc1(F)c1', 'Cc1c1F'} There's still the question of why so many products are being produced. Look at Out[24], why do we get 10 different products? The answer is the symmetry in the query describing the reactant. Everywhere this query can match, it matches twice - frontwards and backwards. So instead of five products, three of which are unique, we get ten. This can be handled by recognizing that [c:1] and [c:3] are not actually involved in the reaction, they are just there to define the environment of [c:2]. We can do the same thing with a recursive SMARTS: In [30]: rxn = AllChem.ReactionFromSmarts('[cH$(c(c)c):2][c:2][F]') In [31]: ps = rxn.RunReactants((m,)) In [32]: len(ps) Out[32]: 5 In [33]: set([Chem.MolToSmiles(x[0],True) for x in ps]) Out[33]: {'Cc1ccc(F)cc1', 'Cc1(F)c1', 'Cc1c1F'} Hope this helps, -greg Then I got desperate: [#6:1][#6:2]([#1])[#6:3].[H][#9:4][#6:1][#6:2]([#9:4])[#6:3] Any mention of an explicit H caused issues, so then I dropped it and re-ran things again. No luck. I should mention that I am using the pre-built python RDKit wrappers for windows, and if I use the java wrappers on linux I get different errors but the same outcome. I should add, that the molecule that I read (and the molecule for HF) were both loaded without issue. Anyone else try to do something like this? Matthew -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss