Re: [Rdkit-discuss] Molecule reading issues
If you just want to ignore the error add a try...catch block around the offending line. Yours, Toby Wright On 31 May 2014 00:03, Matthew Lardy mla...@gmail.com wrote: Hi all, I am having this issue with the Java wrapper while trying to create a smiles string from a RWMol class object. I don't care about trying to figure out what is going wrong, I just want to bypass this record without my application closing. Any ideas? Here is the offending line: rdmol.MolToSmiles(); The error: Exception in thread main org.RDKit.MolSanitizeException at org.RDKit.RDKFuncsJNI.RWMol_MolFromSmiles__SWIG_3(Native Method) at org.RDKit.RWMol.MolFromSmiles(RWMol.java:422) Thanks in advance! Matt -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MMFF94 atom typing OHs connected to aromatic heterocycles.
Thanks very much Paulo. Yours, Toby -- InhibOx Ltd On 15 April 2014 00:25, Paolo Tosco paolo.to...@unito.it wrote: Dear Toby, I checked the MMFF literature and indeed that hydrogen must be type 29; I just submitted a pull request with the bug fix. Apparently the test case that you presented is not covered by the validation suite and so I missed that bug until today: thank you very much for reporting it! Cheers, p. On 04/14/2014 03:42 PM, Toby Wright wrote: Hi, I've been using the MMFF94 forcefield and noticed an odd behaviour is a couple of molecules. phenolish1 = Chem.MolFromSmiles('Oc1ncccn1') phenolish2 = Chem.MolFromSmiles('Oc1ncncc1') prop1 = AllChem.MMFFGetMoleculeProperties(Chem.AddHs(phenolish1)) prop2 = AllChem.MMFFGetMoleculeProperties(Chem.AddHs(phenolish2)) print prop1.GetMMFFAtomType(7) #Atom 7 is the H of the OH 21 print prop2.GetMMFFAtomType(7) #Atom 7 is the H of the OH 29 Type 29 is a hydrogen attached to an oxygen in enols, phenols or HO-C=N which is only sort of the case here (but perhaps pragmatically we should consider phenol the closest option?) Digging into the code in GraphMol/ForceFieldHelpers/MMFF/AtomTyper.cpp we see (between lines 2092 and 2133) that we consider an atom to hit the phenol case if we have the oxygen attached to a carbon attached via an aromatic bond to another carbon. We have this in phenolish2 but not in phenolish1 hence the different outputs. If we change the test on line 2115 to: if ((bond-getBondType() == Bond::AROMATIC) || ((nbr3Atom-getAtomicNum() == 6) (bond-getBondType() == Bond::DOUBLE))) { then both cases above show the same behaviour, considering phenolish things to be phenols for the sake of MMFF94 atom typing. Alternatively we could consider phenolish things to be not phenols and implement atom type 21 for the hydrogen in both cases. Any thoughts? Yours, Toby Wright PS I'm aware that the tautomers above aren't ideal, these are fragments snipped from more complex molecules to demonstrate the behaviour. -- InhibOx Ltd -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today!http://p.sf.net/sfu/NeoTech ___ Rdkit-discuss mailing listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- == Paolo Tosco, Ph.D. Department of Drug Science and Technology Via Pietro Giuria, 9 - 10125 Torino (Italy) Tel: +39 011 670 7680 | Mob: +39 348 5537206 Fax: +39 011 670 7687 | E-mail: paolo.tosco@unito.ithttp://open3dqsar.org | http://open3dalign.org == -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MMFF94 atom typing OHs connected to aromatic heterocycles.
Hi, I've been using the MMFF94 forcefield and noticed an odd behaviour is a couple of molecules. phenolish1 = Chem.MolFromSmiles('Oc1ncccn1') phenolish2 = Chem.MolFromSmiles('Oc1ncncc1') prop1 = AllChem.MMFFGetMoleculeProperties(Chem.AddHs(phenolish1)) prop2 = AllChem.MMFFGetMoleculeProperties(Chem.AddHs(phenolish2)) print prop1.GetMMFFAtomType(7) #Atom 7 is the H of the OH 21 print prop2.GetMMFFAtomType(7) #Atom 7 is the H of the OH 29 Type 29 is a hydrogen attached to an oxygen in enols, phenols or HO-C=N which is only sort of the case here (but perhaps pragmatically we should consider phenol the closest option?) Digging into the code in GraphMol/ForceFieldHelpers/MMFF/AtomTyper.cpp we see (between lines 2092 and 2133) that we consider an atom to hit the phenol case if we have the oxygen attached to a carbon attached via an aromatic bond to another carbon. We have this in phenolish2 but not in phenolish1 hence the different outputs. If we change the test on line 2115 to: if ((bond-getBondType() == Bond::AROMATIC) || ((nbr3Atom-getAtomicNum() == 6) (bond-getBondType() == Bond::DOUBLE))) { then both cases above show the same behaviour, considering phenolish things to be phenols for the sake of MMFF94 atom typing. Alternatively we could consider phenolish things to be not phenols and implement atom type 21 for the hydrogen in both cases. Any thoughts? Yours, Toby Wright PS I'm aware that the tautomers above aren't ideal, these are fragments snipped from more complex molecules to demonstrate the behaviour. -- InhibOx Ltd -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [H] vs [2H] in reactions
Hi, Don't worry, in my real cases I'm using [OH:1][C:2] (or similar depending on the reaction in question). I guess I made my simplest possible example too simple to be a useful communication. Adding the ;D1 will also protect against molecules specified as [1H]OC but hopefully my third party input data doesn't have anything so odd. My intuition was that [OH]C [H]OC and [2H]OC were the same in terms of things like the degree of the oxygen but playing around with daylight's tools tells me I was wrong. I guess then the original question has been answered by the pragmatic principle that states: hydrogens are not atoms iff they are of unspecified isotope. In the deuterium case we have a hydrogen atom attached to the oxygen that is not mentioned in the reaction and so, like any atom, is carried across with the mapped atoms. In the [OH] case we have an oxygen with a property and that property need not be conserved by reaction transforms, and so isn't. And in the [H]O case it is internally converted to an [OH] before the reaction takes place. Thanks once again for your time, Toby Wright -- InhibOx Ltd On 8 April 2014 02:35, Greg Landrum greg.land...@gmail.com wrote: Hi Toby, On Mon, Apr 7, 2014 at 3:37 PM, Toby Wright toby.wri...@inhibox.comwrote: Noticed something odd but I'm not confident enough with reaction SMARTS to say it's a bug. I'm reacting with an OH, leading to the oxygen losing the hydrogen and gaining a second bond. For example: rxn = AllChem.ReactionFromSmarts([O:1]C[O:1]) mol = Chem.MolFromSmiles(OC) p = rxn.RunReactants((mol,))[0][0] Chem.SanitizeMol(p) rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE Chem.MolToSmiles(p) 'COC' So far so good. Now to add explicit hydrogens to the oxygen. If mol is [OH]C or [H]OC the same behaviour as above happens. However if it is [2H]OC we run into: ValueError: Sanitization error: Explicit valence for atom # 1 O, 3, is greater than permitted because the deuterium is being preserved whereas in the other cases the hydrogen is discarded. I can't find anything in the SMARTS documentation to suggest that this is the correct behaviour so I'm going to suggest that if the [H] was being discarded by the reaction then so should the [2H]. You've diagnosed what is happening correctly: the RemoveHs() functionality does not remove the [2H] since that's not something that can be replaced by inspecting the valence of the O atom. That's not the real problem here though. The reaction above also won't work for ethers or anything with a double bond to an O. What you more likely want is something like: rxn = AllChem.ReactionFromSmarts([OH;D1:1]C[O:1]) this will match CO, but not C=O, COC, or CO[2H]. If you want the reaction to also apply to the deuterated species, which you say later in your email you don't, I think you're going to have to AddHs to the molecules before calling RunReactants() and explicitly include the H in the reaction query. Or, of course, you could add as second reaction to deal with Hs that are actually present. -greg In either case it's not a problem for me as I have no particular interest in Deuterium containing molecules so I don't need a workaround or quick fix. I just happened across the behaviour and thought it worth reporting. Yours, Toby Wright -- InhibOx Ltd -- Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees_APR ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] [H] vs [2H] in reactions
Hi, Noticed something odd but I'm not confident enough with reaction SMARTS to say it's a bug. I'm reacting with an OH, leading to the oxygen losing the hydrogen and gaining a second bond. For example: rxn = AllChem.ReactionFromSmarts([O:1]C[O:1]) mol = Chem.MolFromSmiles(OC) p = rxn.RunReactants((mol,))[0][0] Chem.SanitizeMol(p) rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE Chem.MolToSmiles(p) 'COC' So far so good. Now to add explicit hydrogens to the oxygen. If mol is [OH]C or [H]OC the same behaviour as above happens. However if it is [2H]OC we run into: ValueError: Sanitization error: Explicit valence for atom # 1 O, 3, is greater than permitted because the deuterium is being preserved whereas in the other cases the hydrogen is discarded. I can't find anything in the SMARTS documentation to suggest that this is the correct behaviour so I'm going to suggest that if the [H] was being discarded by the reaction then so should the [2H]. In either case it's not a problem for me as I have no particular interest in Deuterium containing molecules so I don't need a workaround or quick fix. I just happened across the behaviour and thought it worth reporting. Yours, Toby Wright -- InhibOx Ltd -- Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees_APR___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Further issue with reactions and chirality
Further investigation shows that this issue is not related to the reaction code at all, it's a general SMILES canonicalisation bug I'm afraid. Consider the following: mol = Chem.MolFromSmiles(C1C[C@@H](CC)CC[C@@H](CC)1) print Chem.MolToSmiles(mol, isomericSmiles=True) CC[C@@H]1CC[C@@H](CC)CC1 The output should be the same as the input but plugging those strings into the daylight website's depiction tool gives chirally different molecules. This behaviour is observed in RDKit 2013.09 with no custom patches. Yours, Toby Wright -- InhibOx Ltd On 28 March 2014 15:47, Toby Wright toby.wri...@inhibox.com wrote: Oops, forgot to mention: This is with the solution to github issue #233https://github.com/rdkit/rdkit/issues/233patched into my RDKit build. Yours, Toby Wright -- InhibOx Ltd On 28 March 2014 15:43, Toby Wright toby.wri...@inhibox.com wrote: Hi, I believe I've found a bug in the new code that deals with reactions that have chirality specified for untagged product atoms. Consider the following: rxn = AllChem.ReactionFromSmarts([C:1].[C:2]C1C[C@@H](C[C:1])CC[C@ @H](C[C:2])1) m1 = Chem.MolFromSmiles('FC') m2 = Chem.MolFromSmiles('BrC') p = rxn.RunReactants((m1,m2))[0][0] Chem.SanitizeMol(p) rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE Chem.MolToSmiles(p, isomericSmiles=True) 'FCC[C@@H]1CC[C@@H](CCBr)CC1' The output looks about right, the [C@@H]s are both still [C@@H] but whereas before they were both being approached from around the ring now the canonicalisation has us approaching one from outside the ring. Both extensions from the ring should be towards and if I convert the product part of the above reaction to a png I get: [image: Inline images 1] but in the output one is towards and the other is away: [image: Inline images 2] Note that I can work around this, if I specify my reaction as [C:1].[C:2][C:1]C[C@H]1CC[C@@H](C[C:2])CC1 thus apeing the atom ordering of the product RDKit will give me I get the chirality I want, at least in my test cases so far. Yours, Toby Wright -- InhibOx Ltd inline: MadeProduct.pnginline: ReactionProduct.png-- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Further issue with reactions and chirality
Oops, forgot to mention: This is with the solution to github issue #233https://github.com/rdkit/rdkit/issues/233patched into my RDKit build. Yours, Toby Wright -- InhibOx Ltd On 28 March 2014 15:43, Toby Wright toby.wri...@inhibox.com wrote: Hi, I believe I've found a bug in the new code that deals with reactions that have chirality specified for untagged product atoms. Consider the following: rxn = AllChem.ReactionFromSmarts([C:1].[C:2]C1C[C@@H](C[C:1])CC[C@ @H](C[C:2])1) m1 = Chem.MolFromSmiles('FC') m2 = Chem.MolFromSmiles('BrC') p = rxn.RunReactants((m1,m2))[0][0] Chem.SanitizeMol(p) rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE Chem.MolToSmiles(p, isomericSmiles=True) 'FCC[C@@H]1CC[C@@H](CCBr)CC1' The output looks about right, the [C@@H]s are both still [C@@H] but whereas before they were both being approached from around the ring now the canonicalisation has us approaching one from outside the ring. Both extensions from the ring should be towards and if I convert the product part of the above reaction to a png I get: [image: Inline images 1] but in the output one is towards and the other is away: [image: Inline images 2] Note that I can work around this, if I specify my reaction as [C:1].[C:2][C:1]C[C@H]1CC[C@@H](C[C:2])CC1 thus apeing the atom ordering of the product RDKit will give me I get the chirality I want, at least in my test cases so far. Yours, Toby Wright -- InhibOx Ltd inline: ReactionProduct.pnginline: MadeProduct.png-- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Further issue with reactions and chirality
Hi, I believe I've found a bug in the new code that deals with reactions that have chirality specified for untagged product atoms. Consider the following: rxn = AllChem.ReactionFromSmarts([C:1].[C:2]C1C[C@@H](C[C:1])CC[C@ @H](C[C:2])1) m1 = Chem.MolFromSmiles('FC') m2 = Chem.MolFromSmiles('BrC') p = rxn.RunReactants((m1,m2))[0][0] Chem.SanitizeMol(p) rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE Chem.MolToSmiles(p, isomericSmiles=True) 'FCC[C@@H]1CC[C@@H](CCBr)CC1' The output looks about right, the [C@@H]s are both still [C@@H] but whereas before they were both being approached from around the ring now the canonicalisation has us approaching one from outside the ring. Both extensions from the ring should be towards and if I convert the product part of the above reaction to a png I get: [image: Inline images 1] but in the output one is towards and the other is away: [image: Inline images 2] Note that I can work around this, if I specify my reaction as [C:1].[C:2][C:1]C[C@H]1CC[C@@H](C[C:2])CC1 thus apeing the atom ordering of the product RDKit will give me I get the chirality I want, at least in my test cases so far. Yours, Toby Wright -- InhibOx Ltd inline: ReactionProduct.pnginline: MadeProduct.png-- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Reactions and chirality: Untagged chiral product atoms
Hi, Looking over the documentation and discussion threads I've found solid and sensible answers for how chiral molecules and reactions in almost every case, but I've hit what seems to be an issue in a situation I can't find discussed. I have atoms in my product that are untagged and do not appear in my reactants. This is because I'm shortcutting a number of steps in what is happening in the real chemistry where these extra atoms are added. And RDKit behaves exactly as I would hope in general when I do this, adding these new atoms to the product without taking them from any reactant. But where these new atoms have chiral information it is being lost, as shown by the following example: rxn = AllChem.ReactionFromSmarts([F:1][C:2]([C:3])[I:4][F:1][C:2]([C:3][C@H ]([OH])Br)[Cl:4]) m = Chem.MolFromSmiles('FC(C)I') p = rxn.RunReactants((m,))[0][0] Chem.SanitizeMol(p) rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE Chem.MolToSmiles(p,isomericSmiles=True) 'OC(Br)CC(F)Cl' The output I was hoping for was O[C@@H](CC(F)Cl)Br but the chiral information in the unmapped atoms in the product part of the reaction specification appears to have been lost. Is this a bug or am I going about my reaction in the wrong way? Yours, Toby Wright -- InhibOx Ltd -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS/SMARTS and SMILES/SMARTS substructure matching
Thanks Greg, The final strange behaviour I've noticed that could trip fellow users up is with matching kekule verses aromatic representations of the same molecule in SMARTS against SMILES. Most surprisingly C1=CC=CC=C1 is not a substructure of itself but has c1c1 as a substructure (if the lefthand term is SMILES and the right is SMARTS in both cases). Code to demonstrate what I mean below: aromatic_benzene_smiles = Chem.MolFromSmiles('c1c1') aromatic_benzene_smarts = Chem.MolFromSmarts('c1c1') kekule_benzene_smiles = Chem.MolFromSmiles('C1=CC=CC=C1') kekule_benzene_smarts = Chem.MolFromSmarts('C1=CC=CC=C1') aromatic_benzene_smiles.HasSubstructMatch(aromatic_benzene_smarts) True aromatic_benzene_smiles.HasSubstructMatch(kekule_benzene_smiles) True aromatic_benzene_smiles.HasSubstructMatch(kekule_benzene_smarts) False kekule_benzene_smiles.HasSubstructMatch(kekule_benzene_smarts) False kekule_benzene_smiles.HasSubstructMatch(aromatic_benzene_smiles) True kekule_benzene_smiles.HasSubstructMatch(aromatic_benzene_smarts) True I think I can see why there is a difference in behaviour, a double bond is not the same thing as an aromatic bond. In the SMILES case a conversion can take place because the context is complete but in the SMARTS case it is not (or at least might not be). But I thought I'd point out the issue in any case. The workaround is to always explicitly make atoms aromatic in SMARTS if you wish them to match aromatic SMILES rather than relying on the kekule representation to sort it for you. Yours, Toby Wright -- InhibOx Ltd On 6 March 2014 04:55, Greg Landrum greg.land...@gmail.com wrote: On Wed, Mar 5, 2014 at 4:03 PM, Toby Wright toby.wri...@inhibox.comwrote: This is probably related to the above so I thought I'd post it on this thread. I am noticing inconsistent behaviour when a molecule created via SMARTS that contains an 'or' statement has HasSubstructMatch called on it, as opposed to it being the argument to HasSubstructMatch. A simple example follows: O_or_C = Chem.MolFromSmarts('[O,C]') O = Chem.MolFromSmiles('O') C = Chem.MolFromSmiles('C') O_or_C.HasSubstructMatch(O) True O_or_C.HasSubstructMatch(C) False O.HasSubstructMatch(O_or_C) True C.HasSubstructMatch(O_or_C) True We also see: C_or_O = Chem.MolFromSmarts('[C,O]') C_or_O.HasSubstructMatch(O) False C_or_O.HasSubstructMatch(C) True so the order of elements in a SMARTS 'or' statement changes the behaviour, which is unexpected. This is indeed related. This is a case I didn't cover above: the SMILES/SMARTS match. The behavior above is expected from the point of view of what's in the code, though I can understand how it may not make much sense from the perspective of someone using the code. :-) The above should probably return False in both cases. In general, one should probably expect that using the HasSubstructMatch() method of a molecule constructed from SMARTS is likely to produce strange results. Getting a general purpose query--query matcher to work is, as far as I can tell, a decidedly non-trivial problem. -greg -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Two nitrogens in a 5 membered ring
Thanks all for informative and helpful responses, the behaviour I was struggling to understand now makes perfect sense. Toby Wright -- InhibOx Ltd On 4 March 2014 04:06, Greg Landrum greg.land...@gmail.com wrote: Bob hit the nail on the head. The first case, N1N=CC=C1, is aromatic because the RDKit sees that the first nitrogen has two bonds to it, assigns a hydrogen, and then sees a conjugated pi system with 6 electrons that is flagged as aromatic. Something similar would happen with the aromatic form [nH]1nccc1: first the ring system is kekulized to yield N1N=CC=C1, then the sanitization proceeds from there. The same thing would happen with the equivalent n1[nH]ccc1. The second case, N1=NC=CC1, has a C (the last one) that only has single bonds to it. This is assigned sp3 hybridization, so there's no conjugated ring system for aromaticity to be perceived in. The final case, n1nccc1, is an instance of the pyrrole problem: aromatic N's that need an implicit H on them, should have that implicit H present in the aromatic SMILES. -greg On Mon, Mar 3, 2014 at 5:59 PM, Bob Funchess bfunch...@kelaroo.comwrote: Hi Toby, I'd say it's more of a limitation inherent in Kekule representations than an actual bug in RDKit. Trying to get too clever in figuring out what the user meant usually causes more harm than good. I'm not sure what version of RDKit you're using, but the aromatic specification with an explicit hydrogen on one of the nitrogen atoms works for me: Chem.MolFromSmiles('n1[nH]ccc1').Debug(); Atoms: 0 7 N chg: 0 deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0 1 7 N chg: 0 deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0 2 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0 3 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0 4 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0 Bonds: 0 0-1 order: 12 conj?: 1 aromatic?: 1 1 1-2 order: 12 conj?: 1 aromatic?: 1 2 2-3 order: 12 conj?: 1 aromatic?: 1 3 3-4 order: 12 conj?: 1 aromatic?: 1 4 4-0 order: 12 conj?: 1 aromatic?: 1 The double bonds in the Kekule representations here can be between atom pairs 1,2 and 3,4 or between atom pairs 2,3 and 4,0. Putting one between pair 0,1 leaves atom 4 with two single bonds to it (and therefore, to satisfy valence requirements, two implicit hydrogens); I'm not horribly surprised that RDKit perceives that as aliphatic. You can see that's what's happening in your second example where the hybridization of atom 4 is 4 (sp3) instead of 3 (sp2). Regards, Bob -- Bob Funchess, Ph.D. Kelaroo, Inc Senior Scientist www.kelaroo.com bfunch...@kelaroo.com (858) 259-7561 x3 -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS/SMARTS and SMILES/SMARTS substructure matching
Hi, This is probably related to the above so I thought I'd post it on this thread. I am noticing inconsistent behaviour when a molecule created via SMARTS that contains an 'or' statement has HasSubstructMatch called on it, as opposed to it being the argument to HasSubstructMatch. A simple example follows: O_or_C = Chem.MolFromSmarts('[O,C]') O = Chem.MolFromSmiles('O') C = Chem.MolFromSmiles('C') O_or_C.HasSubstructMatch(O) True O_or_C.HasSubstructMatch(C) False O.HasSubstructMatch(O_or_C) True C.HasSubstructMatch(O_or_C) True We also see: C_or_O = Chem.MolFromSmarts('[C,O]') C_or_O.HasSubstructMatch(O) False C_or_O.HasSubstructMatch(C) True so the order of elements in a SMARTS 'or' statement changes the behaviour, which is unexpected. Yours, Toby Wright -- InhibOx Ltd On 5 March 2014 10:10, Christos Kannas chriskan...@gmail.com wrote: Hi Greg, Thanks a lot for the explanation. It makes things clearer now. Well the reason I'm doing SMARTS-SMARTS match is because I would like to match functional groups with the reactants in reactions. Regards, Christos Christos Kannas Researcher Ph.D Student Mob (UK): +44 (0) 7447700937 Mob (Cyprus): +357 99530608 [image: View Christos Kannas's profile on LinkedIn]http://cy.linkedin.com/in/christoskannas On 5 March 2014 04:44, Greg Landrum greg.land...@gmail.com wrote: Hi Christos, On Tue, Mar 4, 2014 at 3:46 PM, Christos Kannas chriskan...@gmail.comwrote: Hi all, Why does the following happen? In [1]: from rdkit import Chem In [2]: from rdkit.Chem import AllChem In [3]: from rdkit.Chem import Draw In [4]: patt = Chem.MolFromSmarts([CH;D2;!$(C-[!#6;!#1])]=O) In [5]: z2 = Chem.MolFromSmarts([*]-C-C([H])(=O), 1) In [6]: print Chem.MolToSmiles(z2) [*]CC=O In [7]: print Chem.MolToSmarts(z2) *-C-[C!H0]=O In [9]: z2.HasSubstructMatch(patt) Out[9]: False In [10]: z3 = Chem.MolFromSmiles(Chem.MolToSmiles(z2)) In [11]: print Chem.MolToSmiles(z3) [*]CC=O In [12]: print Chem.MolToSmarts(z3) [*]-[#6]-[#6]=[#8] In [13]: z3.HasSubstructMatch(patt) Out[13]: True Shouldn't be that z2 and z3 have the same information? The way SMARTS/SMARTS matches is handled is different than the way SMARTS/SMILES matches works. The short answer is that when doing a SMARTS/SMARTS match, the RDKit compares the queries to each other; when doing a SMARTS/SMILES match, on the other hand, it checks to see if the atoms in the SMILES molecule match the queries in the SMARTS molecule. A bit longer answer: Molecules built using MolFromSmiles contain Atoms, molecules built using MolFromSmarts contain QueryAtoms. Both atoms and QueryAtoms have a Match() method that takes another Atom or QueryAtom as an argument and returns whether or not the two match. The substructure matching code makes heavy use of this Match() method. QueryAtom.Match(Atom) checks to see if the Atom satisfies the query. QueryAtom.Match(QueryAtom) checks to see if the queries on the atoms are the same. This uses a crude approach that is easy to fool, but I assume that a SMARTS-SMARTS match is not a frequent thing someone wants to do. query-query matching is also not a particularly easy problem to solve in a general way. -greg -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Two nitrogens in a 5 membered ring
Hi, If I have a five membered ring with 2 consecutive Ns and alternating single and double bonds expressed by the smiles: N1N=CC=C1 RDKit gives me a molecule in which every atom is aromatic. If I give it: N1=NC=CC1 it gives me a molecule in which every atom is aliphatic. If I give it: n1nccc1 it gives me a kekulization error. I, possibly naively, thought the forms would be all aromatic or all aliphatic. Am I missing something or is this a bug? Chem.MolFromSmiles('N1N=CC=C1').Debug() Atoms: 0 7 N chg: 0 deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0 1 7 N chg: 0 deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0 2 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0 3 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0 4 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0 Bonds: 0 0-1 order: 12 conj?: 1 aromatic?: 1 1 1-2 order: 12 conj?: 1 aromatic?: 1 2 2-3 order: 12 conj?: 1 aromatic?: 1 3 3-4 order: 12 conj?: 1 aromatic?: 1 4 4-0 order: 12 conj?: 1 aromatic?: 1 Chem.MolFromSmiles('N1=NC=CC1').Debug() Atoms: 0 7 N chg: 0 deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 0 chi: 0 1 7 N chg: 0 deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 0 chi: 0 2 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0 3 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0 4 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: 4 arom?: 0 chi: 0 Bonds: 0 0-1 order: 2 conj?: 1 aromatic?: 0 1 1-2 order: 1 conj?: 1 aromatic?: 0 2 2-3 order: 2 conj?: 1 aromatic?: 0 3 3-4 order: 1 conj?: 0 aromatic?: 0 4 4-0 order: 1 conj?: 0 aromatic?: 0 Chem.MolFromSmiles('n1nccc1').Debug() [15:31:44] Can't kekulize mol Yours, Toby Wright -- InhibOx Ltd -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Fwd: SMARTS Substructure matching
Hi Christos, If you add hydrogens to m3 after creating it in RDKit then both m1 and m2 are recognised as substructures of m3. See below for how I achieved this: from rdkit import Chem m1 = Chem.MolFromSmarts([C:3][C:4](=[O:5])[O:6]([H:100])) m2 = Chem.MolFromSmarts([C:3][C:4](=[O:5])[O;H:6]) m3 = Chem.MolFromSmiles(CC(=O)O) m3H = Chem.AddHs(m3) m3.HasSubstructMatch(m1) False m3H.HasSubstructMatch(m1) True m3.HasSubstructMatch(m2) True m3H.HasSubstructMatch(m2) True Hope that helps. Yours, Toby Wright -- InhibOx Ltd, Oxford On 19 February 2014 10:25, Christos Kannas chriskan...@gmail.com wrote: Hi all, At my current project I'm working on reaction based multiobjective de novo design. And I have a set of reactions that I have converted into SMIRKS and reaction SMARTS.. The problem I have is that when I have a reactant pattern in SSMARTS, as required by SMIRKS, that has explicit mapped Hydrogens that play a role in reaction, and I request a substructure search matching to a compound that has the substructure in question it can not find a match. But when I change the pattern to not have explicit mapped hydrogens the substructure matching search is successful. To help you understand I've created this small IPython Notebook http://nbviewer.ipython.org/gist/CKannas/9089271 Can you give me the reasons why this happens? Best, Christos -- Christos Kannas Researcher Ph.D Student Mob (UK): +44 (0) 7447700937 Mob (Cyprus): +357 99530608 [image: View Christos Kannas's profile on LinkedIn]http://cy.linkedin.com/in/christoskannas -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Possible rotatable bonds replacement
Hi, I favour option 1 but not strongly over option 3. Option 2 is cleanest but I think the cost to users that expect the existing behaviour is too high. I don't see much difference in the confusion levels between: numRotatableBonds() vs numStrictRotatableBonds() and numRotatableBonds() vs numRotatableBonds(strict=true) as neither is truly clean if the user thinks that the two definitions are interchangable. The invariant numRotatableBonds(X)=numStrictRotatableBonds(X) holds which is why I was thinking that one is a strict version of the other, but I'd welcome a better name for the new function/variable. Yours, Toby Wright -- InhibOx Ltd Oxford On 31 January 2014 11:05, JP jeanpaul.ebe...@inhibox.com wrote: My 2p worth: I am not a big fan of outright replacing the NumRotatableBonds implementation (option 2). This is quite a popular descriptor which is used in many ways (e.g. QSAR models, conformer generation, property calculation, etc.). IF we are lucky (or skilful, or have had enough time), we have tests written out for everything which will break as soon as soon as we get different rotatable bonds count, and different results. We can then revalidate our protocols using the new (strict) rotatable counts. Perhaps we get better correlations/enrichments/AUCs etc ! Yeah! On the other hand option (1), having two methods NumRotatableBonds() and NumStrictRotatableBonds() will lead to some confusion. Greg has a point about different people and/or libraries intermixing between the two. Like Paul, I prefer option (3) - with the default behaviour giving the old rotatable counts (not strict). This does not come for free either, as the API becomes slightly less clean (and what to do in the future when, for example, someone finds a non-SMARTS based way to do this -- add another parameter?). Still I think this is the less of all evils. Thanks Toby Greg! JP On 31 January 2014 06:54, paul.czodrow...@merckgroup.com wrote: I could add the new descriptor as Toby provided it. People are then free to pick between NumRotatableBonds() and NumStrictRotatableBonds (). This has the advantage of maintaining strict backwards compatibility, but I could imagine it being confusing/irritating to people using the code to have to choose between them (or, worse, using both). Another option is to just replace the current NumRotatableBonds() SMARTS with the new one. This loses backwards compatibility, but replaces NumRotableBonds() with something more correct. Finally, I could take a hybrid approach: replace the default NumRotatableBonds() with the new one, but add an extra argument that allows the old one to be used. I'm leaning towards the second option. I'd normally go with the third, but I almost view this as a bug fix for the rotatable bonds definition. Comments? suggestions? Other options? I like your idea of your hybrid approach which would mean backwards compatibility. paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.merckgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer. -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http
Re: [Rdkit-discuss] Counting amide groups in rotatable bond counts
Hi, Sorry for the extremely slow reply, thanks for the insights and I hope you all had a excellent Christmas break. I think the best thing is to roll my own definition of a bond which would be rotatable if not for the fact that it's an amide. Something like $([NH]!D1)-!@C=O and then take that number of these bonds away from RDKit's rotatable bond count. If you simply take away the number of amides calculated by RDKit from the number of rotatable bonds you hit an error where an amide bond was not considered rotatable in the first place (for example because the N was terminal). The Chemaxon definition of rotatable bonds troubles me somewhat. Given the following molecule: CC(=O)NCC it claims there are no rotatable bonds at all. The non-amide N-C bond is discounted because one of the atoms fulfils the pattern ([NH]!@C(=O)), that is it is a N connected to a C=O group, even though this connection is not made by the N-C bond in question. Thanks again, Yours, Toby Wright -- InhibOx Ltd Oxford On 24 December 2013 15:46, Gerebtzoff, Gregori gregori.gerebtz...@roche.com wrote: Hi Toby, One additional note on what Greg wrote: you can define another smarts pattern for the identification of rotatable bonds: Lipinski.RotatableBondSmarts = Chem.MolFromSmarts(...) Some smarts from the literature: Daylight: [!$(*#*)!D1$(*(-[!#1])~[!#1])]-!@[!$(*#*)!D1$(*(-[!#1])~[!#1])] Chemaxon: [!$([NH]!@C(=O))!D1!$(*#*)]-!@[!$([NH]!@C(=O))!D1!$(*#*)] Best, Grégori -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] atom equivalence for substructure matching
While this doesn't answer your core question of can RDKit do what you want without manually editing the smarts strings, if you do end up hacking it using 'C[CX3v4](~O)~O' might be cleaner than 'CC(~O)~O)' as it would exclude the case where both Os were singly bonded. Yours, Toby Wright -- InhibOx Ltd On 30 October 2013 01:12, S.L. Chan slch...@yahoo.com wrote: Good evening, I would like to get an exhaustive substructure matching of a molecule onto itself. Generally I could use the GetSubstructMatches function with the uniquify=False option. However, if there is a carboxylate or a guanidinium head around, this would give only one side of the match since the two oxygens / nitrogens are not considered equivalent: mol = Chem.MolFromSmiles('CC(=O)[O-]') patt = Chem.MolFromSmarts('CC(=O)[O-]') print mol.GetSubstructMatches(patt,uniquify=False) ((0,1,2,3),) Now, I suppose I could do an ugly (could in principle match two single bonds) hack to achieve my purpose: mol = Chem.MolFromSmiles('CC(=O)[O-]') patt = Chem.MolFromSmarts('CC(~O)~O') print mol.GetSubstructMatches(patt,uniquify=False) ((0,1,2,3), (0,1,3,2)) However, this would mean that I would need to manually edit the smarts string for all molecules. I just wonder if there is something similar to the Kekulize command that would make the two oxygens equivalent? Or are there other ways around this? Ling -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Surprising DeleteSubstructs(smiles, smiles) behaviour
Hi, I just ran into a small gotcha and thought I'd share it. I have a molecule with a fragment of 6 carbons, 5 of which form a ring, and I am deleting fragments that match CC. I thought that if I were working in SMILES the ring fragment would be spared, but not if it was a SMARTS. However as the following code shows it gets deleted either way. import rdkit from rdkit import Chem query = Chem.MolFromSmiles('C.CC11') remove_as_smiles = Chem.MolFromSmiles('CC') remove_as_smarts = Chem.MolFromSmarts('CC') print Chem.MolToSmiles(Chem.DeleteSubstructs(query, remove_as_smiles, onlyFrags=True)) C print Chem.MolToSmiles(Chem.DeleteSubstructs(query, remove_as_smarts, onlyFrags=True)) C So now I know to use [C!r][C!r][C!r][C!r][C!r][C!r] explicitly if that's what I mean. Hope this saves someone else from stumbling into my mistakes. Yours, Toby Wright -- InhibOx Ltd -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Inconsistancy across elements in making Hs explicit
Hi, I've observed an odd behaviour in RDKit with listing explicit hydrogens in smiles where the original molecules were generated from SD files. As the code below shows if I ask What is the smiles for a single C atom? I get C but if I ask for silicon I get [SiH4]. Any reason why this might be? I've also observed that in RDKit 2013 Q2 I get [Fe] as the smiles from a single iron atom, but in RDKit 2011 Q4 I get [FeH6] but I can't see anything in the release notes to explain this change. I also have examples involving atoms in larger molecules but I thought these provided the simplest examples. Example files and interactive python snippet: sup = Chem.SDMolSupplier(SingleSi.sdf) sup2 = Chem.SDMolSupplier(SingleC.sdf) print Chem.MolToSmiles(sup[0], canonical=True, isomericSmiles=True) [SiH4] print Chem.MolToSmiles(sup2[0], canonical=True, isomericSmiles=True) C SingleC.sdf: SingleC RDKit 2D 1 0 0 0 0 0 0 0 0 0999 V2000 0.0.0. C 0 0 0 0 0 0 0 0 0 0 0 0 M END SingleSi.sdf: SingleSi RDKit 2D 1 0 0 0 0 0 0 0 0 0999 V2000 0.0.0. Si 0 0 0 0 0 0 0 0 0 0 0 0 M END Thanks, Toby -- InhibOx Ltd -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Can't read SDF data lines when CTAB is in V3000 format
Hi, I'm trying to read the data lines from an SD file where the CTAB is in V3000 format. If the file v3000propIssue.sdf contains the following: testMol 0 0 0 0 0999 V3000 M V30 BEGIN CTAB M V30 COUNTS 1 0 0 0 0 M V30 BEGIN ATOM M V30 1 C 0 0 0 0 M V30 END ATOM M V30 END CTAB M END TestProp 42 then it is read by an SDMolSupplier it loads correctly (as shown by the Debug) apart from the data lines which are not converted to RDKit properties as the following interactive code snippet show: import rdkit from rdkit import Chem mol = Chem.SDMolSupplier(v3000propIssue.sdf).next() [10:59:33] ERROR: Problems encountered parsing data fields [10:59:33] ERROR: moving to the begining of the next molecule mol.HasProp(_Name) 1 mol.HasProp(TestProp) 0 mol.Debug() Atoms: 0 6 C chg: 0 deg: 0 exp: 0 imp: 4 hyb: 4 arom?: 0 chi: 0 Bonds: Any ideas why this might be? Yours, Toby Wright -- InhibOx Ltd -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Chirality lost unless molecule sanitized on load
Hi, I think the following behaviour is a bug but feel free to correct me. I have an SD file (attached) with two stereoisomers of alanine (built by openbabel from the smiles). I want to read it and write it's contents as isomeric smiles. I execute the following: import rdkit from rdkit import Chem smiles_writer = Chem.SmilesWriter(ChiralTest.smi, includeHeader=False, isomericSmiles=True) suppl = Chem.SDMolSupplier(ChiralTest3D.sdf, sanitize=False) for mol in suppl: Chem.SanitizeMol(mol) smiles_writer.write(mol) smiles_writer.flush() smiles_writer.close() smiles_writer2 = Chem.SmilesWriter(ChiralTest2.smi, includeHeader=False, isomericSmiles=True) suppl2 = Chem.SDMolSupplier(ChiralTest3D.sdf, sanitize=True) for mol in suppl2: smiles_writer2.write(mol) smiles_writer2.flush() smiles_writer2.close() The file ChiralTest.smi now contains: [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H] L-alanine [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H] D-alanine and ChiralTest2.smi contains: C[C@H](N)C(=O)O L-alanine C[C@@H](N)C(=O)O D-alanine My question is why do I get different outputs depending on when sanitization was performed? Yours, Toby Wright -- InhibOx Ltd ChiralTest3D.sdf Description: Binary data -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Minor bug in Data/Crippen.txt
Hi, On line 11 of the file Data/Crippen.txt the label says C2 but the SMARTS expression, log p and MR values are as expected for case C3 from WildmanCrippen '99, which suggests that the thing wrong is simply the label. I also have a question that might be a bit foolish as I'm not an accomplished chemist, but does the SMARTS for O11 deal correctly when the Oxygen in question is bonded to aromatic atoms? If I understand correctly it should match either aromatic or aliphatic elements (apart from the Carbon), but the SMARTS as written will only match in the aliphatic case. Yours, Toby Wright -- inhibOx -- RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss