Re: [Rdkit-discuss] separate module by breaking bonds

2013-05-31 Thread Yingfeng Wang
Greg,

Thank you very much.

Yingfeng


On Fri, May 31, 2013 at 4:14 AM, Greg Landrum greg.land...@gmail.comwrote:


 On Fri, May 31, 2013 at 7:53 AM, Yingfeng Wang ywang...@gmail.com wrote:


 I think I need to add masses of all atoms for getting the mass of the
 whole fragment. Am I right?


 It's better to just use the built in functionality:

 In [2]: m = Chem.MolFromSmiles('c1c1')

 In [3]: from rdkit.Chem import Descriptors

 In [4]: Descriptors.MolWt(m)
 Out[4]: 78.113999

  -greg

--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] separate module by breaking bonds

2013-05-31 Thread Yingfeng Wang
Greg,

I got a weird case. With the following code,

 from rdkit import Chem
 from rdkit.Chem import Descriptors
 m
=Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')
 Descriptors.MolWt(m)
9.072
 em = Chem.EditableMol(m)
 em.RemoveBond(8,7)
 nm = em.GetMol()
 frags = Chem.GetMolFrags(nm,asMols=True)
 [Descriptors.MolWt(x) for x in frags]
[5.04, 6.048]

It seems the mass of the original molecule is 9.072, while the summation of
masses of its fragments is 5.04+6.048  9.072. I don't think it is caused
by the precision problem. Could you please let me know how to remove the
extra mass?

Thanks.

Yingfeng


On Fri, May 31, 2013 at 8:58 AM, Yingfeng Wang ywang...@gmail.com wrote:

 Greg,

 Thank you very much.

 Yingfeng


 On Fri, May 31, 2013 at 4:14 AM, Greg Landrum greg.land...@gmail.comwrote:


 On Fri, May 31, 2013 at 7:53 AM, Yingfeng Wang ywang...@gmail.comwrote:


 I think I need to add masses of all atoms for getting the mass of the
 whole fragment. Am I right?


 It's better to just use the built in functionality:

 In [2]: m = Chem.MolFromSmiles('c1c1')

 In [3]: from rdkit.Chem import Descriptors

 In [4]: Descriptors.MolWt(m)
 Out[4]: 78.113999

  -greg



--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] dissociation energy

2013-05-31 Thread Yingfeng Wang
How do I retrieve the dissociation energy of  a bond in a molecule.
Say, I have

from rdkit import Chem
m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-
12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')
mybond = m.GetBondBetweenAtoms(8,7)

Is there a way for me to retrieve the dissociation energy of mybond?

Thanks.

Yingfeng
--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] separate module by breaking bonds

2013-05-31 Thread Yingfeng Wang
I change MolWt() into ExactMolWt(), and get

Descriptors.ExactMolWt(m)
187.074561908

[Descriptors.ExactMolWt(x) for x in frags]
[111.04326178, 78.046950192]

According to the document, MolWt() outputs the average molecular weight of
the molecule ignoring hydrogens, while ExactMolWt() is for the exact
molecular weight of the molecule. What do I observe so big difference of
mass? By the way, the mass of m calculated by the ExactMolWt () is still
less than summation of its fragments.

Thanks.

Yingfeng




On Fri, May 31, 2013 at 3:58 PM, Yingfeng Wang ywang...@gmail.com wrote:

 Greg,

 I got a weird case. With the following code,

  from rdkit import Chem
  from rdkit.Chem import Descriptors
  m
 =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')
  Descriptors.MolWt(m)
 9.072
  em = Chem.EditableMol(m)
  em.RemoveBond(8,7)
  nm = em.GetMol()
  frags = Chem.GetMolFrags(nm,asMols=True)
  [Descriptors.MolWt(x) for x in frags]
 [5.04, 6.048]

 It seems the mass of the original molecule is 9.072, while the summation
 of masses of its fragments is 5.04+6.048  9.072. I don't think it is
 caused by the precision problem. Could you please let me know how to remove
 the extra mass?

 Thanks.

 Yingfeng


 On Fri, May 31, 2013 at 8:58 AM, Yingfeng Wang ywang...@gmail.com wrote:

 Greg,

 Thank you very much.

 Yingfeng


 On Fri, May 31, 2013 at 4:14 AM, Greg Landrum greg.land...@gmail.comwrote:


 On Fri, May 31, 2013 at 7:53 AM, Yingfeng Wang ywang...@gmail.comwrote:


 I think I need to add masses of all atoms for getting the mass of the
 whole fragment. Am I right?


 It's better to just use the built in functionality:

 In [2]: m = Chem.MolFromSmiles('c1c1')

 In [3]: from rdkit.Chem import Descriptors

 In [4]: Descriptors.MolWt(m)
 Out[4]: 78.113999

  -greg




--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Editable molecule confusion

2013-05-31 Thread Robert DeLisle
Another attempt - see Code Block 3, below.  In this case, I construct the
ring systems from the group up using an EditableMol.  Once again it fails
during sanitization, but now I think I know why.  The original structure
has an indole with a substituted nitrogen.  During building, that nitrogen
does not have a hydrogen attached, so the valence is not satisfied and
sanitization fails.  If I change this to an indane, it works just fine.

The problem is, I cannot add hydrogens to the nitrogen until after the
EditableMol is converted to a Mol, but I cannot convert it to a Mol until
hydrogens are added.  All of this would require some fairly sophisticated
logic about the nitrogen which I'm not sure I want to include for this
simple task.




Code Block 3:
from rdkit import Chem
from rdkit.Chem import AllChem

sdin = Chem.SDMolSupplier('test.sdf')
sdout = Chem.SDWriter('rings.sdf')

for m in sdin:

  em = Chem.EditableMol(Chem.Mol())
  indexmap = {}

  for a in m.GetAtoms():
if ( a.IsInRing() ):
  indexmap[a.GetIdx()] = em.AddAtom(Chem.Atom(a.GetAtomicNum()))

  for b in m.GetBonds():
if ( b.IsInRing() ):
  em.AddBond(
indexmap[b.GetBeginAtomIdx()],indexmap[b.GetEndAtomIdx()],b.GetBondType() )

  for nm in Chem.GetMolFrags(em.GetMol(), asMols=True):
AllChem.Compute2DCoords(nm)
sdout.write(nm)


On Fri, May 31, 2013 at 2:41 PM, Robert DeLisle rkdeli...@gmail.com wrote:

 I am attempting to reduce a molecule (attached SDF) to just its ring
 systems using Code Block 1 at the bottom.

 The problem is that when I get through the loops removing non-ring
 atoms/bonds, and convert the EditableMol back to a Mol, I end up with 7
 disjoint sets of atoms:

 ((0, 1, 2, 3, 4, 5, 6, 7, 8), (9, 10, 11, 12, 13, 14), (15,), (16,), (17,
 20), (18,), (19,))

 It appears that when I remove an atom from the EditableMol by index, the
 indices are reassigned.  I tried to test this with the inelegant code in
 Code Block 2, which gives me the expected sets of atom indices with respect
 to number and size:

 ((0, 1, 2, 3, 4, 5, 6, 7, 8), (9, 10, 11, 12, 13, 14), (15, 16, 17, 18,
 19, 20))

 - but it still fails to sanitize when I convert back to a Mol.

 What am I missing here?  Also, is there an easier (ie, existing) way to do
 this?  I'm just looking to reduce the molecule to its ring systems and
 write those to an SD file.

 -Kirk


 Code block 1:
 from rdkit import Chem

 sdin = Chem.SDMolSupplier('test.sdf')
 sdout = Chem.SDWriter('rings.sdf')

 for m in sdin:

   print len(m.GetBonds()),len(m.GetAtoms())
   em = Chem.EditableMol(m)

   for a in m.GetAtoms():
 if ( not a.IsInRing() ):
   em.RemoveAtom(a.GetIdx())
   print a.GetIdx(), m.GetAtomWithIdx(a.GetIdx()).GetSymbol()

   for b in m.GetBonds():
 if ( not b.IsInRing() ):
   a1 = b.GetBeginAtomIdx()
   a2 = b.GetEndAtomIdx()
   em.RemoveBond(a1,a2)

   m3 = em.GetMol()
   print len(m3.GetBonds()), len(m3.GetAtoms())

   f = Chem.GetMolFrags(em.GetMol())
   print f

   #for f in Chem.GetMolFrags(m3,asMols = True):
 #sdout.write(f)

 Code block 2:
 from rdkit import Chem

 sdin = Chem.SDMolSupplier('test.sdf')
 sdout = Chem.SDWriter('rings.sdf')

 for m in sdin:

   print len(m.GetBonds()),len(m.GetAtoms())
   em = Chem.EditableMol(m)

   active = True

   while ( active == True ):
 active = False
 for a in m.GetAtoms():
   if ( not a.IsInRing() ):
 print a.GetIdx(), m.GetAtomWithIdx(a.GetIdx()).GetSymbol()
 em.RemoveAtom(a.GetIdx())
 active = True
 m=em.GetMol()
 em = Chem.EditableMol(m)
 break

   active = True
   while ( active == True):
 active = False
 for b in m.GetBonds():
   if ( not b.IsInRing() ):
 active = True
 a1 = b.GetBeginAtomIdx()
 a2 = b.GetEndAtomIdx()
 em.RemoveBond(a1,a2)
 m=em.GetMol()
 em = Chem.EditableMol(m)
 break

   m3 = em.GetMol()
   print len(m3.GetBonds()), len(m3.GetAtoms())

   f = Chem.GetMolFrags(em.GetMol())
   print f

   #for f in Chem.GetMolFrags(m3,asMols = True):
 #sdout.write(f)



--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Editable molecule confusion

2013-05-31 Thread Greg Landrum
Hi Kirk,

Quick FYI: the attachment on the original message didn't make it through.
Still, I think the idea is clear enough.

A general question I'd have is whether or not you really want to completely
ignore the nature of the bond to the non-ring atom. For example, with this
scheme you're going to end up getting the same ring systems for quinone and
1,4 cyclohexadiene derivatives:
O=C1C=CC(=O)C=C1 - C1C=CCC=C1
CC1C=CC(C)C=C1 - C1C=CCC=C1

Similarly, sulfones in rings will lead to thioethers:
O=S1(=O)1 - S11

Is that desirable? If not, you could try adding dummies connected with an
appropriate bond type so that you get:
O=C1C=CC(=O)C=C1 - *=C1C=CC(=*)C=C1


On Fri, May 31, 2013 at 11:41 PM, Robert DeLisle rkdeli...@gmail.comwrote:

 Maybe the logic is that sophisticated after all - Code Block 4 below.


The trick with adding the Hs to Ns is, unfortunately, necessary. You should
be adding them to aromatic heteroatoms.

Code Block 4:
 from rdkit import Chem
 from rdkit.Chem import AllChem

 sdin = Chem.SDMolSupplier('test.sdf')
 sdout = Chem.SDWriter('rings.sdf')

 for m in sdin:

   em = Chem.EditableMol(Chem.Mol())
   indexmap = {}

   for a in m.GetAtoms():

 if ( a.IsInRing() ):
   indexmap[a.GetIdx()] = em.AddAtom(Chem.Atom(a.GetAtomicNum()))

   for b in m.GetBonds():

 if ( b.IsInRing() ):
   em.AddBond(
 indexmap[b.GetBeginAtomIdx()],indexmap[b.GetEndAtomIdx()],b.GetBondType() )

   for a in m.GetAtoms():
 if ( a.IsInRing() and a.GetAtomicNum() == 7 ):


I would replace the above line with:
if a.GetIsAromatic() and a.GetAtomicNum() != 6:


   for b in a.GetBonds():
  if ( not b.IsInRing() ):
em.AddBond(em.AddAtom(Chem.Atom(1)), indexmap[a.GetIdx()],
 Chem.BondType.SINGLE)

   for nm in Chem.GetMolFrags(em.GetMol(), asMols=True):


You probably should be calling either Chem.SanitizeMol() or Chem.RemoveHs()
here. I'd recommend the second variant because I don't think you want the
Hs in the output.


 AllChem.Compute2DCoords(nm)
 sdout.write(nm)


-greg
--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] dissociation energy

2013-05-31 Thread Greg Landrum
Dear Yingfeng,

On Fri, May 31, 2013 at 10:25 PM, Yingfeng Wang ywang...@gmail.com wrote:

 How do I retrieve the dissociation energy of  a bond in a molecule.
 Say, I have

 from rdkit import Chem
 m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-
 12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')
 mybond = m.GetBondBetweenAtoms(8,7)

 Is there a way for me to retrieve the dissociation energy of mybond?


The RDKit does not have this type of functionality.

-greg
--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Editable molecule confusion

2013-05-31 Thread Greg Landrum
On Sat, Jun 1, 2013 at 6:11 AM, Robert DeLisle rkdeli...@gmail.com wrote:

 You are absolutely correct on all counts.  I've also found problems with a
 variety of special cases in which the non-cyclic atom is required for
 aromaticity.  Lots of special cases to consider.


Oh yeah, if you want the aromaticity to survive the fragmentation and
sanitization, you're going to need to include the doubly bound exocyclic N
or O as an N or O. Here's an example:

In [2]: Chem.CanonSmiles('*=C1NC=CC=C1')
Out[2]: '[*]=C1C=CC=CN1'

In [3]: Chem.CanonSmiles('O=C1NC=CC=C1')
Out[3]: 'O=c1[nH]1'

not a simple story, unfortunately.

I'll integrate your suggestions and do done error checking.
 On May 31, 2013 9:55 PM, Greg Landrum greg.land...@gmail.com wrote:

 Hi Kirk,

 Quick FYI: the attachment on the original message didn't make it through.
 Still, I think the idea is clear enough.

 A general question I'd have is whether or not you really want to
 completely ignore the nature of the bond to the non-ring atom. For example,
 with this scheme you're going to end up getting the same ring systems for
 quinone and 1,4 cyclohexadiene derivatives:
 O=C1C=CC(=O)C=C1 - C1C=CCC=C1
 CC1C=CC(C)C=C1 - C1C=CCC=C1

 Similarly, sulfones in rings will lead to thioethers:
 O=S1(=O)1 - S11

 Is that desirable? If not, you could try adding dummies connected with an
 appropriate bond type so that you get:
 O=C1C=CC(=O)C=C1 - *=C1C=CC(=*)C=C1


 On Fri, May 31, 2013 at 11:41 PM, Robert DeLisle rkdeli...@gmail.comwrote:

 Maybe the logic is that sophisticated after all - Code Block 4 below.


 The trick with adding the Hs to Ns is, unfortunately, necessary. You
 should be adding them to aromatic heteroatoms.

 Code Block 4:
 from rdkit import Chem
 from rdkit.Chem import AllChem

 sdin = Chem.SDMolSupplier('test.sdf')
 sdout = Chem.SDWriter('rings.sdf')

 for m in sdin:

   em = Chem.EditableMol(Chem.Mol())
   indexmap = {}

   for a in m.GetAtoms():

 if ( a.IsInRing() ):
   indexmap[a.GetIdx()] = em.AddAtom(Chem.Atom(a.GetAtomicNum()))

   for b in m.GetBonds():

 if ( b.IsInRing() ):
   em.AddBond(
 indexmap[b.GetBeginAtomIdx()],indexmap[b.GetEndAtomIdx()],b.GetBondType() )

   for a in m.GetAtoms():
 if ( a.IsInRing() and a.GetAtomicNum() == 7 ):


 I would replace the above line with:
 if a.GetIsAromatic() and a.GetAtomicNum() != 6:


   for b in a.GetBonds():
  if ( not b.IsInRing() ):
em.AddBond(em.AddAtom(Chem.Atom(1)), indexmap[a.GetIdx()],
 Chem.BondType.SINGLE)

   for nm in Chem.GetMolFrags(em.GetMol(), asMols=True):


 You probably should be calling either Chem.SanitizeMol() or
 Chem.RemoveHs() here. I'd recommend the second variant because I don't
 think you want the Hs in the output.


 AllChem.Compute2DCoords(nm)
 sdout.write(nm)


 -greg


--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] separate module by breaking bonds

2013-05-31 Thread Greg Landrum
Combining two replies into one here.

On Fri, May 31, 2013 at 9:58 PM, Yingfeng Wang ywang...@gmail.com wrote:

 Greg,

 I got a weird case. With the following code,

  from rdkit import Chem
  from rdkit.Chem import Descriptors
  m
 =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')
  Descriptors.MolWt(m)
 9.072
  em = Chem.EditableMol(m)
  em.RemoveBond(8,7)
  nm = em.GetMol()
  frags = Chem.GetMolFrags(nm,asMols=True)
  [Descriptors.MolWt(x) for x in frags]
 [5.04, 6.048]

 It seems the mass of the original molecule is 9.072, while the summation
 of masses of its fragments is 5.04+6.048  9.072. I don't think it is
 caused by the precision problem. Could you please let me know how to remove
 the extra mass?


It's not extra mass. The difference is caused by the Hs that are added to
the fragments to replace the bonds that have been broken:

In [14]: m
=Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')

In [15]: AllChem.CalcMolFormula(m)
Out[15]: 'C10H9N3O'

In [16]: em = Chem.EditableMol(m)

In [17]: em.RemoveBond(8,7)

In [18]: nm = em.GetMol()

In [19]: frags = Chem.GetMolFrags(nm,asMols=True)

In [20]: [AllChem.CalcMolFormula(x) for x in frags]
Out[20]: ['C4H5N3O', 'C6H6']

you can see that there are two more Hs here.


On Fri, May 31, 2013 at 10:33 PM, Yingfeng Wang ywang...@gmail.com wrote:

 I change MolWt() into ExactMolWt(), and get

 Descriptors.ExactMolWt(m)
 187.074561908

 [Descriptors.ExactMolWt(x) for x in frags]
 [111.04326178, 78.046950192]

 According to the document, MolWt() outputs the average molecular weight
 of the molecule ignoring hydrogens, while ExactMolWt() is for the exact
 molecular weight of the molecule. What do I observe so big difference of
 mass? By the way, the mass of m calculated by the ExactMolWt () is still
 less than summation of its fragments.


The docstring for MolWt is incredibly wrong. Thanks for pointing it out.
I'll fix it.
MolWt() computes the average molecular weight of the molecule, taking into
account the standard isotope distribution for each atom where an isotope is
not specified.
ExactMolWt() computes the molecular weight using either the atomic weights
of the most common isotope (if not specified) or the specified isotope. An
example of that:

In [22]: Descriptors.ExactMolWt(Chem.MolFromSmiles('[CH4]'))
Out[22]: 16.03130012798

In [23]: Descriptors.ExactMolWt(Chem.MolFromSmiles('[12CH4]'))
Out[23]: 16.03130012798

In [24]: Descriptors.ExactMolWt(Chem.MolFromSmiles('[13CH4]'))
Out[24]: 17.03465496798

-greg
--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss