Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-15 Thread Greg Landrum
On Tue, Jan 14, 2014 at 11:48 AM, Greg Landrum greg.land...@gmail.comwrote:

 ok, it looks like something bad happened[1] when the PDB branch was merged
 into trunk before the last release. Here's an example that worked properly
 at the time of the UGM:

 In [5]: m =Chem.MolFromPDBFile('data/2FVD.pdb')
 In [6]: Chem.MolToSmiles(m,canonical=False)
 Out[6]: 'NC(C(O)NC(C  '

 Here's the notebook showing what's supposed to happen:

 http://nbviewer.ipython.org/github/rdkit/UGM_2013/blob/master/Notebooks/Whats_new.ipynb

 I'll look into this as soon as I can and get it fixed.


I just tracked this down and fixed it. The changes are checked into github.
Details about what happened are below.

Here's my example from above now:

In [3]: m = Chem.MolFromPDBFile('./2FVD.pdb')
In [4]: Chem.MolToSmiles(m,canonical=False)
Out[4]: 'NC(C(=O)NC(C(=O)NC(C(=O'


This is, IMO, a major enough problem that it's worth doing a patch release
to address it. Over the next few days, I will put together a list of fixes
(not new features) that should be in the 2013_09_2 release and adjust the
milestones for those issues. Please feel free to suggest additions. The
list (currently empty) can be found here:
https://github.com/rdkit/rdkit/issues?milestone=6

For those who care, here's how the bug came about.
The bond-type assignment code for standard PDB residues tests bonded atoms
to make sure they are in the same residue. This code compares the two
atoms' AtomPDBResidueInfo structures. Shortly before the 2013_09_1 release,
I added an explicit residueNumber property to the AtomPDBResidueInfo class
and switched the serialNumber property (previously used to store the
residueNumber) to capture the actual serial number of the atom. I forgot to
update the residue comparison code to reflect this change, so
the SamePDBResidue() function was returning false unless the two atoms were
the same. silly mistake.

Best,
-greg
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-15 Thread JP
Thanks Greg!  Much appreciated.

-
Jean-Paul Ebejer
Early Stage Researcher


On 15 January 2014 08:38, Greg Landrum greg.land...@gmail.com wrote:


 On Tue, Jan 14, 2014 at 11:48 AM, Greg Landrum greg.land...@gmail.comwrote:

 ok, it looks like something bad happened[1] when the PDB branch was
 merged into trunk before the last release. Here's an example that worked
 properly at the time of the UGM:

 In [5]: m =Chem.MolFromPDBFile('data/2FVD.pdb')
 In [6]: Chem.MolToSmiles(m,canonical=False)
 Out[6]: 'NC(C(O)NC(C  '

 Here's the notebook showing what's supposed to happen:

 http://nbviewer.ipython.org/github/rdkit/UGM_2013/blob/master/Notebooks/Whats_new.ipynb

 I'll look into this as soon as I can and get it fixed.


 I just tracked this down and fixed it. The changes are checked into
 github. Details about what happened are below.

 Here's my example from above now:

 In [3]: m = Chem.MolFromPDBFile('./2FVD.pdb')
 In [4]: Chem.MolToSmiles(m,canonical=False)
 Out[4]: 'NC(C(=O)NC(C(=O)NC(C(=O'


 This is, IMO, a major enough problem that it's worth doing a patch release
 to address it. Over the next few days, I will put together a list of fixes
 (not new features) that should be in the 2013_09_2 release and adjust the
 milestones for those issues. Please feel free to suggest additions. The
 list (currently empty) can be found here:
 https://github.com/rdkit/rdkit/issues?milestone=6

 For those who care, here's how the bug came about.
 The bond-type assignment code for standard PDB residues tests bonded atoms
 to make sure they are in the same residue. This code compares the two
 atoms' AtomPDBResidueInfo structures. Shortly before the 2013_09_1 release,
 I added an explicit residueNumber property to the AtomPDBResidueInfo class
 and switched the serialNumber property (previously used to store the
 residueNumber) to capture the actual serial number of the atom. I forgot to
 update the residue comparison code to reflect this change, so
 the SamePDBResidue() function was returning false unless the two atoms were
 the same. silly mistake.

 Best,
 -greg




 --
 CenturyLink Cloud: The Leader in Enterprise Cloud Services.
 Learn Why More Businesses Are Choosing CenturyLink Cloud For
 Critical Workloads, Development Environments  Everything In Between.
 Get a Quote or Start a Free Trial Today.

 http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-14 Thread sereina riniker
Hi JP,

However I am unable to get bond orders for the protein side - am I doing
 something wrong or is this the intended behaviour ?
 I imagine I can use AssignBondOrdersFromTemplate() for the 20 amino acids
 and set these myself -- or is there a better way to do this?


I don't know why your protein doesn't get bond orders, the PDBParser should
know the standard amino acids. At least it worked for me when I tried
Chem.MolFromPDB() in the past. Which PDB structure do you try to read?


  Also, is there a way to make AssignBondOrdersFromTemplate assign bond
 orders to all matches?


The function was meant for assigning bonds based on an entire molecule. It
would probably not be so difficult to change this (with default = match
only one), if it is really needed.


 Also another thing I don't quite understand is in the following below
 code, I get a WARNING: More than one matching pattern found - picking one
 but how can my template match multiple times (this is not symettrical) ?


The way the AssignBondOrdersFromTemplate() function works is the following:
1) a copy of the template is generated where all bonds are set to single
bonds
2) this single-bonds copy is used for a substructure match with the query
molecule
3) bond orders are assigned based on this match and the original template

If you get this warning, it means that there is some symmetry in the
all-single-bonds-stage of your molecule. In your case, I guess it's the
carboxylic acids which can match two ways when there are only single bonds.

I hope this helps.

Best,
Sereina






 On 13 January 2014 21:02, JP jeanpaul.ebe...@inhibox.com wrote:
 
  Thanks All - I think I am in a good place now.
 
  I can get the SMILES from Paul's mmcif links and then I can use Sereina
 magic three lines to do what I want.  I'd cross my fingers - but with RDKit
 you don't need to.
  This works for all Chemical Components (or what other fashionable name
 they go by these days) in the PDB.
 
  For posterity: I have found a post in the mailing list started by James
 which sheds some light on this:
 
 https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03481.html
 
 
 
 
  On 13 January 2014 19:46, sereina riniker sereina.rini...@gmail.com
 wrote:
 
  Hi JP,
 
  If you have also a SMILES of the molecule you want to read from PDB,
 you can assign the bond orders based on this template:
 
  tmp = Chem.MolFromPDBFile(yourfilename)
  template = Chem.MolFromSmiles(yoursmiles)
  mol = AllChem.AssignBondOrdersFromTemplate(template, tmp)
 
  Is this what you're looking for?
 
  Best,
  Sereina
 
 
  2014/1/13 JP jeanpaul.ebe...@inhibox.com
 
  RDKitters!
 
  Finally back on the mailing list!
 
  I am sure we've been through this at the UGM (my mind must have
 wandered off!), but a quick question about the PDB reader and bond
 perception.  Is this supported with the current PDB reader?  I remember
 that someone (PaulE, perhaps?) was saying bond perception was painful, but
 there was some dictionary for PDB ligands which helps (any idea the name of
 this dictionary?).
 
  To the technical details.
 
  I am reading in the following PDB file with a simple MolFromPDBFile()
 call:
 
  HETATM1  O1P 84T A1862 -27.016   9.387 -72.564  1.00 20.81
   O
  HETATM2  P   84T A1862 -27.282   9.818 -73.968  1.00 19.65
   P
  HETATM3  O2P 84T A1862 -27.881  11.176 -74.182  1.00 21.49
   O
  HETATM4  N   84T A1862 -25.869   9.583 -74.813  1.00 19.78
   N
  HETATM5  C   84T A1862 -25.759  10.010 -76.075  1.00 19.97
   C
  HETATM6  CA  84T A1862 -24.493   9.748 -76.807  1.00 19.75
   C
  HETATM7  CB  84T A1862 -24.794   8.678 -77.847  1.00 19.73
   C
  HETATM8  CG  84T A1862 -23.571   8.324 -78.681  1.00 19.70
   C
  HETATM9  CD2 84T A1862 -23.309   9.519 -79.611  1.00 18.49
   C
  HETATM   10  CD1 84T A1862 -23.863   6.932 -79.305  1.00 18.60
   C
  HETATM   11  OHB 84T A1862 -25.210   7.467 -77.223  1.00 19.17
   O
  HETATM   12  OH  84T A1862 -23.549   9.127 -75.984  1.00 20.33
   O
  HETATM   13  O   84T A1862 -26.672  10.517 -76.692  1.00 20.26
   O
  HETATM   14  O5' 84T A1862 -28.377   8.861 -74.619  1.00 19.39
   O
  HETATM   15  C5' 84T A1862 -28.002   7.536 -74.954  1.00 18.47
   C
  HETATM   16  C4' 84T A1862 -28.909   7.000 -76.012  1.00 18.24
   C
  HETATM   17  C3' 84T A1862 -28.901   7.826 -77.298  1.00 18.28
   C
  HETATM   18  C2' 84T A1862 -30.318   7.610 -77.768  1.00 18.69
   C
  HETATM   19  O2' 84T A1862 -30.789   8.641 -78.581  1.00 19.64
   O
  HETATM   20  O4' 84T A1862 -30.262   6.951 -75.529  1.00 18.80
   O
  HETATM   21  C1' 84T A1862 -31.152   7.470 -76.521  1.00 19.01
   C
  HETATM   22  N9  84T A1862 -31.753   8.732 -76.009  1.00 20.08
   N
  HETATM   23  C4  84T A1862 -33.033   9.013 -76.158  1.00 21.10
   C
  HETATM   24  N3  84T 

Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-13 Thread TJ O'Donnell
Hi JP

I use this file from PDB Europe:
ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz
Useful links followed from
http://www.ebi.ac.uk/pdbe-srv/pdbechem/

The pdb.tar.gz file has the standard residues and LOTS of others
with specific CONNECT records.

TJ



On Mon, Jan 13, 2014 at 9:54 AM, JP jeanpaul.ebe...@inhibox.com wrote:

 RDKitters!

 Finally back on the mailing list!

 I am sure we've been through this at the UGM (my mind must have wandered
 off!), but a quick question about the PDB reader and bond perception.  Is
 this supported with the current PDB reader?  I remember that someone
 (PaulE, perhaps?) was saying bond perception was painful, but there was
 some dictionary for PDB ligands which helps (any idea the name of this
 dictionary?).

 To the technical details.

 I am reading in the following PDB file with a simple MolFromPDBFile() call:

 HETATM1  O1P 84T A1862 -27.016   9.387 -72.564  1.00 20.81
   O
 HETATM2  P   84T A1862 -27.282   9.818 -73.968  1.00 19.65
   P
 HETATM3  O2P 84T A1862 -27.881  11.176 -74.182  1.00 21.49
   O
 HETATM4  N   84T A1862 -25.869   9.583 -74.813  1.00 19.78
   N
 HETATM5  C   84T A1862 -25.759  10.010 -76.075  1.00 19.97
   C
 HETATM6  CA  84T A1862 -24.493   9.748 -76.807  1.00 19.75
   C
 HETATM7  CB  84T A1862 -24.794   8.678 -77.847  1.00 19.73
   C
 HETATM8  CG  84T A1862 -23.571   8.324 -78.681  1.00 19.70
   C
 HETATM9  CD2 84T A1862 -23.309   9.519 -79.611  1.00 18.49
   C
 HETATM   10  CD1 84T A1862 -23.863   6.932 -79.305  1.00 18.60
   C
 HETATM   11  OHB 84T A1862 -25.210   7.467 -77.223  1.00 19.17
   O
 HETATM   12  OH  84T A1862 -23.549   9.127 -75.984  1.00 20.33
   O
 HETATM   13  O   84T A1862 -26.672  10.517 -76.692  1.00 20.26
   O
 HETATM   14  O5' 84T A1862 -28.377   8.861 -74.619  1.00 19.39
   O
 HETATM   15  C5' 84T A1862 -28.002   7.536 -74.954  1.00 18.47
   C
 HETATM   16  C4' 84T A1862 -28.909   7.000 -76.012  1.00 18.24
   C
 HETATM   17  C3' 84T A1862 -28.901   7.826 -77.298  1.00 18.28
   C
 HETATM   18  C2' 84T A1862 -30.318   7.610 -77.768  1.00 18.69
   C
 HETATM   19  O2' 84T A1862 -30.789   8.641 -78.581  1.00 19.64
   O
 HETATM   20  O4' 84T A1862 -30.262   6.951 -75.529  1.00 18.80
   O
 HETATM   21  C1' 84T A1862 -31.152   7.470 -76.521  1.00 19.01
   C
 HETATM   22  N9  84T A1862 -31.753   8.732 -76.009  1.00 20.08
   N
 HETATM   23  C4  84T A1862 -33.033   9.013 -76.158  1.00 21.10
   C
 HETATM   24  N3  84T A1862 -34.018   8.339 -76.786  1.00 21.58
   N
 HETATM   25  C2  84T A1862 -35.263   8.846 -76.830  1.00 21.95
   C
 HETATM   26  C8  84T A1862 -31.223   9.701 -75.291  1.00 20.27
   C
 HETATM   27  N7  84T A1862 -32.173  10.618 -75.019  1.00 21.28
 N
 HETATM   28  C5  84T A1862 -33.315  10.213 -75.563  1.00 21.81
   C
 HETATM   29  C6  84T A1862 -34.624  10.702 -75.627  1.00 22.85
   C
 HETATM   30  N1  84T A1862 -35.550  10.010 -76.285  1.00 22.44
   N
 HETATM   31  N6  84T A1862 -35.008  11.862 -75.052  1.00 23.86
   N
 TER
 END

 But I am losing all the double bond (and aromatic) information:

 m = Chem.MolFromPDBFile(sys.argv[1])
 print Chem.MolToSmiles(m)

 Gives me:

 CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1

 As usual, many thanks for your time,

 -
 Jean-Paul Ebejer
 Early Stage Researcher


 --
 CenturyLink Cloud: The Leader in Enterprise Cloud Services.
 Learn Why More Businesses Are Choosing CenturyLink Cloud For
 Critical Workloads, Development Environments  Everything In Between.
 Get a Quote or Start a Free Trial Today.

 http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-13 Thread Paul Emsley
On 13/01/14 17:54, JP wrote:
 RDKitters!

 Finally back on the mailing list!

 I am sure we've been through this at the UGM (my mind must have 
 wandered off!), but a quick question about the PDB reader and bond 
 perception.  Is this supported with the current PDB reader?  I 
 remember that someone (PaulE, perhaps?) was saying bond perception was 
 painful, but there was some dictionary for PDB ligands which helps 
 (any idea the name of this dictionary?).

 To the technical details.

 I am reading in the following PDB file with a simple MolFromPDBFile() 
 call:

 HETATM1  O1P 84T A1862 -27.016   9.387 -72.564  1.00 20.81 
   O
 HETATM2  P   84T A1862 -27.282   9.818 -73.968  1.00 19.65 
   P
 HETATM3  O2P 84T A1862 -27.881  11.176 -74.182  1.00 21.49 
   O
 HETATM4  N   84T A1862 -25.869   9.583 -74.813  1.00 19.78 
   N
 HETATM5  C   84T A1862 -25.759  10.010 -76.075  1.00 19.97 
   C
 HETATM6  CA  84T A1862 -24.493   9.748 -76.807  1.00 19.75 
   C
 HETATM7  CB  84T A1862 -24.794   8.678 -77.847  1.00 19.73 
   C
 HETATM8  CG  84T A1862 -23.571   8.324 -78.681  1.00 19.70 
   C
 HETATM9  CD2 84T A1862 -23.309   9.519 -79.611  1.00 18.49 
   C
 HETATM   10  CD1 84T A1862 -23.863   6.932 -79.305  1.00 18.60 
   C
 HETATM   11  OHB 84T A1862 -25.210   7.467 -77.223  1.00 19.17 
   O
 HETATM   12  OH  84T A1862 -23.549   9.127 -75.984  1.00 20.33 
   O
 HETATM   13  O   84T A1862 -26.672  10.517 -76.692  1.00 20.26 
   O
 HETATM   14  O5' 84T A1862 -28.377   8.861 -74.619  1.00 19.39 
   O
 HETATM   15  C5' 84T A1862 -28.002   7.536 -74.954  1.00 18.47 
   C
 HETATM   16  C4' 84T A1862 -28.909   7.000 -76.012  1.00 18.24 
   C
 HETATM   17  C3' 84T A1862 -28.901   7.826 -77.298  1.00 18.28 
   C
 HETATM   18  C2' 84T A1862 -30.318   7.610 -77.768  1.00 18.69 
   C
 HETATM   19  O2' 84T A1862 -30.789   8.641 -78.581  1.00 19.64 
   O
 HETATM   20  O4' 84T A1862 -30.262   6.951 -75.529  1.00 18.80 
   O
 HETATM   21  C1' 84T A1862 -31.152   7.470 -76.521  1.00 19.01 
   C
 HETATM   22  N9  84T A1862 -31.753   8.732 -76.009  1.00 20.08 
   N
 HETATM   23  C4  84T A1862 -33.033   9.013 -76.158  1.00 21.10 
   C
 HETATM   24  N3  84T A1862 -34.018   8.339 -76.786  1.00 21.58 
   N
 HETATM   25  C2  84T A1862 -35.263   8.846 -76.830  1.00 21.95 
   C
 HETATM   26  C8  84T A1862 -31.223   9.701 -75.291  1.00 20.27 
   C
 HETATM   27  N7  84T A1862 -32.173  10.618 -75.019  1.00 21.28 
   N
 HETATM   28  C5  84T A1862 -33.315  10.213 -75.563  1.00 21.81 
   C
 HETATM   29  C6  84T A1862 -34.624  10.702 -75.627  1.00 22.85 
   C
 HETATM   30  N1  84T A1862 -35.550  10.010 -76.285  1.00 22.44 
   N
 HETATM   31  N6  84T A1862 -35.008  11.862 -75.052  1.00 23.86 
   N
 TER
 END

 But I am losing all the double bond (and aromatic) information:

 m = Chem.MolFromPDBFile(sys.argv[1])
 print Chem.MolToSmiles(m)

 Gives me:

 CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1

 As usual, many thanks for your time,


84T is a reference to chemical description:

http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/84T

This (mmcif) is what I parse, either from the local dictionary or 
downloading the file on the fly:

ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/mmcif/84T.cif

Does that help?

Paul.




--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-13 Thread sereina riniker
Hi JP,

If you have also a SMILES of the molecule you want to read from PDB, you
can assign the bond orders based on this template:

tmp = Chem.MolFromPDBFile(yourfilename)
template = Chem.MolFromSmiles(yoursmiles)
mol = AllChem.AssignBondOrdersFromTemplate(template, tmp)

Is this what you're looking for?

Best,
Sereina


2014/1/13 JP jeanpaul.ebe...@inhibox.com

 RDKitters!

 Finally back on the mailing list!

 I am sure we've been through this at the UGM (my mind must have wandered
 off!), but a quick question about the PDB reader and bond perception.  Is
 this supported with the current PDB reader?  I remember that someone
 (PaulE, perhaps?) was saying bond perception was painful, but there was
 some dictionary for PDB ligands which helps (any idea the name of this
 dictionary?).

 To the technical details.

 I am reading in the following PDB file with a simple MolFromPDBFile() call:

 HETATM1  O1P 84T A1862 -27.016   9.387 -72.564  1.00 20.81
   O
 HETATM2  P   84T A1862 -27.282   9.818 -73.968  1.00 19.65
   P
 HETATM3  O2P 84T A1862 -27.881  11.176 -74.182  1.00 21.49
   O
 HETATM4  N   84T A1862 -25.869   9.583 -74.813  1.00 19.78
   N
 HETATM5  C   84T A1862 -25.759  10.010 -76.075  1.00 19.97
   C
 HETATM6  CA  84T A1862 -24.493   9.748 -76.807  1.00 19.75
   C
 HETATM7  CB  84T A1862 -24.794   8.678 -77.847  1.00 19.73
   C
 HETATM8  CG  84T A1862 -23.571   8.324 -78.681  1.00 19.70
   C
 HETATM9  CD2 84T A1862 -23.309   9.519 -79.611  1.00 18.49
   C
 HETATM   10  CD1 84T A1862 -23.863   6.932 -79.305  1.00 18.60
   C
 HETATM   11  OHB 84T A1862 -25.210   7.467 -77.223  1.00 19.17
   O
 HETATM   12  OH  84T A1862 -23.549   9.127 -75.984  1.00 20.33
   O
 HETATM   13  O   84T A1862 -26.672  10.517 -76.692  1.00 20.26
   O
 HETATM   14  O5' 84T A1862 -28.377   8.861 -74.619  1.00 19.39
   O
 HETATM   15  C5' 84T A1862 -28.002   7.536 -74.954  1.00 18.47
   C
 HETATM   16  C4' 84T A1862 -28.909   7.000 -76.012  1.00 18.24
   C
 HETATM   17  C3' 84T A1862 -28.901   7.826 -77.298  1.00 18.28
   C
 HETATM   18  C2' 84T A1862 -30.318   7.610 -77.768  1.00 18.69
   C
 HETATM   19  O2' 84T A1862 -30.789   8.641 -78.581  1.00 19.64
   O
 HETATM   20  O4' 84T A1862 -30.262   6.951 -75.529  1.00 18.80
   O
 HETATM   21  C1' 84T A1862 -31.152   7.470 -76.521  1.00 19.01
   C
 HETATM   22  N9  84T A1862 -31.753   8.732 -76.009  1.00 20.08
   N
 HETATM   23  C4  84T A1862 -33.033   9.013 -76.158  1.00 21.10
   C
 HETATM   24  N3  84T A1862 -34.018   8.339 -76.786  1.00 21.58
   N
 HETATM   25  C2  84T A1862 -35.263   8.846 -76.830  1.00 21.95
   C
 HETATM   26  C8  84T A1862 -31.223   9.701 -75.291  1.00 20.27
   C
 HETATM   27  N7  84T A1862 -32.173  10.618 -75.019  1.00 21.28
 N
 HETATM   28  C5  84T A1862 -33.315  10.213 -75.563  1.00 21.81
   C
 HETATM   29  C6  84T A1862 -34.624  10.702 -75.627  1.00 22.85
   C
 HETATM   30  N1  84T A1862 -35.550  10.010 -76.285  1.00 22.44
   N
 HETATM   31  N6  84T A1862 -35.008  11.862 -75.052  1.00 23.86
   N
 TER
 END

 But I am losing all the double bond (and aromatic) information:

 m = Chem.MolFromPDBFile(sys.argv[1])
 print Chem.MolToSmiles(m)

 Gives me:

 CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1

 As usual, many thanks for your time,

 -
 Jean-Paul Ebejer
 Early Stage Researcher


 --
 CenturyLink Cloud: The Leader in Enterprise Cloud Services.
 Learn Why More Businesses Are Choosing CenturyLink Cloud For
 Critical Workloads, Development Environments  Everything In Between.
 Get a Quote or Start a Free Trial Today.

 http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-13 Thread JP
Thanks All - I think I am in a good place now.

I can get the SMILES from Paul's mmcif links and then I can use Sereina
magic three lines to do what I want.  I'd cross my fingers - but with RDKit
you don't need to.
This works for all Chemical Components (or what other fashionable name they
go by these days) in the PDB.

For posterity: I have found a post in the mailing list started by James
which sheds some light on this:
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03481.html




On 13 January 2014 19:46, sereina riniker sereina.rini...@gmail.com wrote:

 Hi JP,

 If you have also a SMILES of the molecule you want to read from PDB, you
 can assign the bond orders based on this template:

 tmp = Chem.MolFromPDBFile(yourfilename)
 template = Chem.MolFromSmiles(yoursmiles)
 mol = AllChem.AssignBondOrdersFromTemplate(template, tmp)

 Is this what you're looking for?

 Best,
 Sereina


 2014/1/13 JP jeanpaul.ebe...@inhibox.com

 RDKitters!

 Finally back on the mailing list!

 I am sure we've been through this at the UGM (my mind must have wandered
 off!), but a quick question about the PDB reader and bond perception.  Is
 this supported with the current PDB reader?  I remember that someone
 (PaulE, perhaps?) was saying bond perception was painful, but there was
 some dictionary for PDB ligands which helps (any idea the name of this
 dictionary?).

 To the technical details.

 I am reading in the following PDB file with a simple MolFromPDBFile()
 call:

 HETATM1  O1P 84T A1862 -27.016   9.387 -72.564  1.00 20.81
 O
 HETATM2  P   84T A1862 -27.282   9.818 -73.968  1.00 19.65
 P
 HETATM3  O2P 84T A1862 -27.881  11.176 -74.182  1.00 21.49
 O
 HETATM4  N   84T A1862 -25.869   9.583 -74.813  1.00 19.78
 N
 HETATM5  C   84T A1862 -25.759  10.010 -76.075  1.00 19.97
 C
 HETATM6  CA  84T A1862 -24.493   9.748 -76.807  1.00 19.75
 C
 HETATM7  CB  84T A1862 -24.794   8.678 -77.847  1.00 19.73
 C
 HETATM8  CG  84T A1862 -23.571   8.324 -78.681  1.00 19.70
 C
 HETATM9  CD2 84T A1862 -23.309   9.519 -79.611  1.00 18.49
 C
 HETATM   10  CD1 84T A1862 -23.863   6.932 -79.305  1.00 18.60
 C
 HETATM   11  OHB 84T A1862 -25.210   7.467 -77.223  1.00 19.17
 O
 HETATM   12  OH  84T A1862 -23.549   9.127 -75.984  1.00 20.33
 O
 HETATM   13  O   84T A1862 -26.672  10.517 -76.692  1.00 20.26
 O
 HETATM   14  O5' 84T A1862 -28.377   8.861 -74.619  1.00 19.39
 O
 HETATM   15  C5' 84T A1862 -28.002   7.536 -74.954  1.00 18.47
 C
 HETATM   16  C4' 84T A1862 -28.909   7.000 -76.012  1.00 18.24
 C
 HETATM   17  C3' 84T A1862 -28.901   7.826 -77.298  1.00 18.28
 C
 HETATM   18  C2' 84T A1862 -30.318   7.610 -77.768  1.00 18.69
 C
 HETATM   19  O2' 84T A1862 -30.789   8.641 -78.581  1.00 19.64
 O
 HETATM   20  O4' 84T A1862 -30.262   6.951 -75.529  1.00 18.80
 O
 HETATM   21  C1' 84T A1862 -31.152   7.470 -76.521  1.00 19.01
 C
 HETATM   22  N9  84T A1862 -31.753   8.732 -76.009  1.00 20.08
 N
 HETATM   23  C4  84T A1862 -33.033   9.013 -76.158  1.00 21.10
 C
 HETATM   24  N3  84T A1862 -34.018   8.339 -76.786  1.00 21.58
 N
 HETATM   25  C2  84T A1862 -35.263   8.846 -76.830  1.00 21.95
 C
 HETATM   26  C8  84T A1862 -31.223   9.701 -75.291  1.00 20.27
 C
 HETATM   27  N7  84T A1862 -32.173  10.618 -75.019  1.00 21.28
 N
 HETATM   28  C5  84T A1862 -33.315  10.213 -75.563  1.00 21.81
 C
 HETATM   29  C6  84T A1862 -34.624  10.702 -75.627  1.00 22.85
 C
 HETATM   30  N1  84T A1862 -35.550  10.010 -76.285  1.00 22.44
 N
 HETATM   31  N6  84T A1862 -35.008  11.862 -75.052  1.00 23.86
 N
 TER
 END

 But I am losing all the double bond (and aromatic) information:

 m = Chem.MolFromPDBFile(sys.argv[1])
 print Chem.MolToSmiles(m)

 Gives me:

 CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1

 As usual, many thanks for your time,

 -
 Jean-Paul Ebejer
 Early Stage Researcher


 --
 CenturyLink Cloud: The Leader in Enterprise Cloud Services.
 Learn Why More Businesses Are Choosing CenturyLink Cloud For
 Critical Workloads, Development Environments  Everything In Between.
 Get a Quote or Start a Free Trial Today.

 http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today.