Re: [Rdkit-discuss] separate module by breaking bonds
Greg, Thank you very much. Yingfeng On Fri, May 31, 2013 at 4:14 AM, Greg Landrum greg.land...@gmail.comwrote: On Fri, May 31, 2013 at 7:53 AM, Yingfeng Wang ywang...@gmail.com wrote: I think I need to add masses of all atoms for getting the mass of the whole fragment. Am I right? It's better to just use the built in functionality: In [2]: m = Chem.MolFromSmiles('c1c1') In [3]: from rdkit.Chem import Descriptors In [4]: Descriptors.MolWt(m) Out[4]: 78.113999 -greg -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] separate module by breaking bonds
Greg, I got a weird case. With the following code, from rdkit import Chem from rdkit.Chem import Descriptors m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)') Descriptors.MolWt(m) 9.072 em = Chem.EditableMol(m) em.RemoveBond(8,7) nm = em.GetMol() frags = Chem.GetMolFrags(nm,asMols=True) [Descriptors.MolWt(x) for x in frags] [5.04, 6.048] It seems the mass of the original molecule is 9.072, while the summation of masses of its fragments is 5.04+6.048 9.072. I don't think it is caused by the precision problem. Could you please let me know how to remove the extra mass? Thanks. Yingfeng On Fri, May 31, 2013 at 8:58 AM, Yingfeng Wang ywang...@gmail.com wrote: Greg, Thank you very much. Yingfeng On Fri, May 31, 2013 at 4:14 AM, Greg Landrum greg.land...@gmail.comwrote: On Fri, May 31, 2013 at 7:53 AM, Yingfeng Wang ywang...@gmail.comwrote: I think I need to add masses of all atoms for getting the mass of the whole fragment. Am I right? It's better to just use the built in functionality: In [2]: m = Chem.MolFromSmiles('c1c1') In [3]: from rdkit.Chem import Descriptors In [4]: Descriptors.MolWt(m) Out[4]: 78.113999 -greg -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] dissociation energy
How do I retrieve the dissociation energy of a bond in a molecule. Say, I have from rdkit import Chem m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13- 12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)') mybond = m.GetBondBetweenAtoms(8,7) Is there a way for me to retrieve the dissociation energy of mybond? Thanks. Yingfeng -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] separate module by breaking bonds
I change MolWt() into ExactMolWt(), and get Descriptors.ExactMolWt(m) 187.074561908 [Descriptors.ExactMolWt(x) for x in frags] [111.04326178, 78.046950192] According to the document, MolWt() outputs the average molecular weight of the molecule ignoring hydrogens, while ExactMolWt() is for the exact molecular weight of the molecule. What do I observe so big difference of mass? By the way, the mass of m calculated by the ExactMolWt () is still less than summation of its fragments. Thanks. Yingfeng On Fri, May 31, 2013 at 3:58 PM, Yingfeng Wang ywang...@gmail.com wrote: Greg, I got a weird case. With the following code, from rdkit import Chem from rdkit.Chem import Descriptors m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)') Descriptors.MolWt(m) 9.072 em = Chem.EditableMol(m) em.RemoveBond(8,7) nm = em.GetMol() frags = Chem.GetMolFrags(nm,asMols=True) [Descriptors.MolWt(x) for x in frags] [5.04, 6.048] It seems the mass of the original molecule is 9.072, while the summation of masses of its fragments is 5.04+6.048 9.072. I don't think it is caused by the precision problem. Could you please let me know how to remove the extra mass? Thanks. Yingfeng On Fri, May 31, 2013 at 8:58 AM, Yingfeng Wang ywang...@gmail.com wrote: Greg, Thank you very much. Yingfeng On Fri, May 31, 2013 at 4:14 AM, Greg Landrum greg.land...@gmail.comwrote: On Fri, May 31, 2013 at 7:53 AM, Yingfeng Wang ywang...@gmail.comwrote: I think I need to add masses of all atoms for getting the mass of the whole fragment. Am I right? It's better to just use the built in functionality: In [2]: m = Chem.MolFromSmiles('c1c1') In [3]: from rdkit.Chem import Descriptors In [4]: Descriptors.MolWt(m) Out[4]: 78.113999 -greg -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Editable molecule confusion
Another attempt - see Code Block 3, below. In this case, I construct the ring systems from the group up using an EditableMol. Once again it fails during sanitization, but now I think I know why. The original structure has an indole with a substituted nitrogen. During building, that nitrogen does not have a hydrogen attached, so the valence is not satisfied and sanitization fails. If I change this to an indane, it works just fine. The problem is, I cannot add hydrogens to the nitrogen until after the EditableMol is converted to a Mol, but I cannot convert it to a Mol until hydrogens are added. All of this would require some fairly sophisticated logic about the nitrogen which I'm not sure I want to include for this simple task. Code Block 3: from rdkit import Chem from rdkit.Chem import AllChem sdin = Chem.SDMolSupplier('test.sdf') sdout = Chem.SDWriter('rings.sdf') for m in sdin: em = Chem.EditableMol(Chem.Mol()) indexmap = {} for a in m.GetAtoms(): if ( a.IsInRing() ): indexmap[a.GetIdx()] = em.AddAtom(Chem.Atom(a.GetAtomicNum())) for b in m.GetBonds(): if ( b.IsInRing() ): em.AddBond( indexmap[b.GetBeginAtomIdx()],indexmap[b.GetEndAtomIdx()],b.GetBondType() ) for nm in Chem.GetMolFrags(em.GetMol(), asMols=True): AllChem.Compute2DCoords(nm) sdout.write(nm) On Fri, May 31, 2013 at 2:41 PM, Robert DeLisle rkdeli...@gmail.com wrote: I am attempting to reduce a molecule (attached SDF) to just its ring systems using Code Block 1 at the bottom. The problem is that when I get through the loops removing non-ring atoms/bonds, and convert the EditableMol back to a Mol, I end up with 7 disjoint sets of atoms: ((0, 1, 2, 3, 4, 5, 6, 7, 8), (9, 10, 11, 12, 13, 14), (15,), (16,), (17, 20), (18,), (19,)) It appears that when I remove an atom from the EditableMol by index, the indices are reassigned. I tried to test this with the inelegant code in Code Block 2, which gives me the expected sets of atom indices with respect to number and size: ((0, 1, 2, 3, 4, 5, 6, 7, 8), (9, 10, 11, 12, 13, 14), (15, 16, 17, 18, 19, 20)) - but it still fails to sanitize when I convert back to a Mol. What am I missing here? Also, is there an easier (ie, existing) way to do this? I'm just looking to reduce the molecule to its ring systems and write those to an SD file. -Kirk Code block 1: from rdkit import Chem sdin = Chem.SDMolSupplier('test.sdf') sdout = Chem.SDWriter('rings.sdf') for m in sdin: print len(m.GetBonds()),len(m.GetAtoms()) em = Chem.EditableMol(m) for a in m.GetAtoms(): if ( not a.IsInRing() ): em.RemoveAtom(a.GetIdx()) print a.GetIdx(), m.GetAtomWithIdx(a.GetIdx()).GetSymbol() for b in m.GetBonds(): if ( not b.IsInRing() ): a1 = b.GetBeginAtomIdx() a2 = b.GetEndAtomIdx() em.RemoveBond(a1,a2) m3 = em.GetMol() print len(m3.GetBonds()), len(m3.GetAtoms()) f = Chem.GetMolFrags(em.GetMol()) print f #for f in Chem.GetMolFrags(m3,asMols = True): #sdout.write(f) Code block 2: from rdkit import Chem sdin = Chem.SDMolSupplier('test.sdf') sdout = Chem.SDWriter('rings.sdf') for m in sdin: print len(m.GetBonds()),len(m.GetAtoms()) em = Chem.EditableMol(m) active = True while ( active == True ): active = False for a in m.GetAtoms(): if ( not a.IsInRing() ): print a.GetIdx(), m.GetAtomWithIdx(a.GetIdx()).GetSymbol() em.RemoveAtom(a.GetIdx()) active = True m=em.GetMol() em = Chem.EditableMol(m) break active = True while ( active == True): active = False for b in m.GetBonds(): if ( not b.IsInRing() ): active = True a1 = b.GetBeginAtomIdx() a2 = b.GetEndAtomIdx() em.RemoveBond(a1,a2) m=em.GetMol() em = Chem.EditableMol(m) break m3 = em.GetMol() print len(m3.GetBonds()), len(m3.GetAtoms()) f = Chem.GetMolFrags(em.GetMol()) print f #for f in Chem.GetMolFrags(m3,asMols = True): #sdout.write(f) -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Editable molecule confusion
Hi Kirk, Quick FYI: the attachment on the original message didn't make it through. Still, I think the idea is clear enough. A general question I'd have is whether or not you really want to completely ignore the nature of the bond to the non-ring atom. For example, with this scheme you're going to end up getting the same ring systems for quinone and 1,4 cyclohexadiene derivatives: O=C1C=CC(=O)C=C1 - C1C=CCC=C1 CC1C=CC(C)C=C1 - C1C=CCC=C1 Similarly, sulfones in rings will lead to thioethers: O=S1(=O)1 - S11 Is that desirable? If not, you could try adding dummies connected with an appropriate bond type so that you get: O=C1C=CC(=O)C=C1 - *=C1C=CC(=*)C=C1 On Fri, May 31, 2013 at 11:41 PM, Robert DeLisle rkdeli...@gmail.comwrote: Maybe the logic is that sophisticated after all - Code Block 4 below. The trick with adding the Hs to Ns is, unfortunately, necessary. You should be adding them to aromatic heteroatoms. Code Block 4: from rdkit import Chem from rdkit.Chem import AllChem sdin = Chem.SDMolSupplier('test.sdf') sdout = Chem.SDWriter('rings.sdf') for m in sdin: em = Chem.EditableMol(Chem.Mol()) indexmap = {} for a in m.GetAtoms(): if ( a.IsInRing() ): indexmap[a.GetIdx()] = em.AddAtom(Chem.Atom(a.GetAtomicNum())) for b in m.GetBonds(): if ( b.IsInRing() ): em.AddBond( indexmap[b.GetBeginAtomIdx()],indexmap[b.GetEndAtomIdx()],b.GetBondType() ) for a in m.GetAtoms(): if ( a.IsInRing() and a.GetAtomicNum() == 7 ): I would replace the above line with: if a.GetIsAromatic() and a.GetAtomicNum() != 6: for b in a.GetBonds(): if ( not b.IsInRing() ): em.AddBond(em.AddAtom(Chem.Atom(1)), indexmap[a.GetIdx()], Chem.BondType.SINGLE) for nm in Chem.GetMolFrags(em.GetMol(), asMols=True): You probably should be calling either Chem.SanitizeMol() or Chem.RemoveHs() here. I'd recommend the second variant because I don't think you want the Hs in the output. AllChem.Compute2DCoords(nm) sdout.write(nm) -greg -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] dissociation energy
Dear Yingfeng, On Fri, May 31, 2013 at 10:25 PM, Yingfeng Wang ywang...@gmail.com wrote: How do I retrieve the dissociation energy of a bond in a molecule. Say, I have from rdkit import Chem m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13- 12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)') mybond = m.GetBondBetweenAtoms(8,7) Is there a way for me to retrieve the dissociation energy of mybond? The RDKit does not have this type of functionality. -greg -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Editable molecule confusion
On Sat, Jun 1, 2013 at 6:11 AM, Robert DeLisle rkdeli...@gmail.com wrote: You are absolutely correct on all counts. I've also found problems with a variety of special cases in which the non-cyclic atom is required for aromaticity. Lots of special cases to consider. Oh yeah, if you want the aromaticity to survive the fragmentation and sanitization, you're going to need to include the doubly bound exocyclic N or O as an N or O. Here's an example: In [2]: Chem.CanonSmiles('*=C1NC=CC=C1') Out[2]: '[*]=C1C=CC=CN1' In [3]: Chem.CanonSmiles('O=C1NC=CC=C1') Out[3]: 'O=c1[nH]1' not a simple story, unfortunately. I'll integrate your suggestions and do done error checking. On May 31, 2013 9:55 PM, Greg Landrum greg.land...@gmail.com wrote: Hi Kirk, Quick FYI: the attachment on the original message didn't make it through. Still, I think the idea is clear enough. A general question I'd have is whether or not you really want to completely ignore the nature of the bond to the non-ring atom. For example, with this scheme you're going to end up getting the same ring systems for quinone and 1,4 cyclohexadiene derivatives: O=C1C=CC(=O)C=C1 - C1C=CCC=C1 CC1C=CC(C)C=C1 - C1C=CCC=C1 Similarly, sulfones in rings will lead to thioethers: O=S1(=O)1 - S11 Is that desirable? If not, you could try adding dummies connected with an appropriate bond type so that you get: O=C1C=CC(=O)C=C1 - *=C1C=CC(=*)C=C1 On Fri, May 31, 2013 at 11:41 PM, Robert DeLisle rkdeli...@gmail.comwrote: Maybe the logic is that sophisticated after all - Code Block 4 below. The trick with adding the Hs to Ns is, unfortunately, necessary. You should be adding them to aromatic heteroatoms. Code Block 4: from rdkit import Chem from rdkit.Chem import AllChem sdin = Chem.SDMolSupplier('test.sdf') sdout = Chem.SDWriter('rings.sdf') for m in sdin: em = Chem.EditableMol(Chem.Mol()) indexmap = {} for a in m.GetAtoms(): if ( a.IsInRing() ): indexmap[a.GetIdx()] = em.AddAtom(Chem.Atom(a.GetAtomicNum())) for b in m.GetBonds(): if ( b.IsInRing() ): em.AddBond( indexmap[b.GetBeginAtomIdx()],indexmap[b.GetEndAtomIdx()],b.GetBondType() ) for a in m.GetAtoms(): if ( a.IsInRing() and a.GetAtomicNum() == 7 ): I would replace the above line with: if a.GetIsAromatic() and a.GetAtomicNum() != 6: for b in a.GetBonds(): if ( not b.IsInRing() ): em.AddBond(em.AddAtom(Chem.Atom(1)), indexmap[a.GetIdx()], Chem.BondType.SINGLE) for nm in Chem.GetMolFrags(em.GetMol(), asMols=True): You probably should be calling either Chem.SanitizeMol() or Chem.RemoveHs() here. I'd recommend the second variant because I don't think you want the Hs in the output. AllChem.Compute2DCoords(nm) sdout.write(nm) -greg -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] separate module by breaking bonds
Combining two replies into one here. On Fri, May 31, 2013 at 9:58 PM, Yingfeng Wang ywang...@gmail.com wrote: Greg, I got a weird case. With the following code, from rdkit import Chem from rdkit.Chem import Descriptors m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)') Descriptors.MolWt(m) 9.072 em = Chem.EditableMol(m) em.RemoveBond(8,7) nm = em.GetMol() frags = Chem.GetMolFrags(nm,asMols=True) [Descriptors.MolWt(x) for x in frags] [5.04, 6.048] It seems the mass of the original molecule is 9.072, while the summation of masses of its fragments is 5.04+6.048 9.072. I don't think it is caused by the precision problem. Could you please let me know how to remove the extra mass? It's not extra mass. The difference is caused by the Hs that are added to the fragments to replace the bonds that have been broken: In [14]: m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)') In [15]: AllChem.CalcMolFormula(m) Out[15]: 'C10H9N3O' In [16]: em = Chem.EditableMol(m) In [17]: em.RemoveBond(8,7) In [18]: nm = em.GetMol() In [19]: frags = Chem.GetMolFrags(nm,asMols=True) In [20]: [AllChem.CalcMolFormula(x) for x in frags] Out[20]: ['C4H5N3O', 'C6H6'] you can see that there are two more Hs here. On Fri, May 31, 2013 at 10:33 PM, Yingfeng Wang ywang...@gmail.com wrote: I change MolWt() into ExactMolWt(), and get Descriptors.ExactMolWt(m) 187.074561908 [Descriptors.ExactMolWt(x) for x in frags] [111.04326178, 78.046950192] According to the document, MolWt() outputs the average molecular weight of the molecule ignoring hydrogens, while ExactMolWt() is for the exact molecular weight of the molecule. What do I observe so big difference of mass? By the way, the mass of m calculated by the ExactMolWt () is still less than summation of its fragments. The docstring for MolWt is incredibly wrong. Thanks for pointing it out. I'll fix it. MolWt() computes the average molecular weight of the molecule, taking into account the standard isotope distribution for each atom where an isotope is not specified. ExactMolWt() computes the molecular weight using either the atomic weights of the most common isotope (if not specified) or the specified isotope. An example of that: In [22]: Descriptors.ExactMolWt(Chem.MolFromSmiles('[CH4]')) Out[22]: 16.03130012798 In [23]: Descriptors.ExactMolWt(Chem.MolFromSmiles('[12CH4]')) Out[23]: 16.03130012798 In [24]: Descriptors.ExactMolWt(Chem.MolFromSmiles('[13CH4]')) Out[24]: 17.03465496798 -greg -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss