Re: [Rdkit-discuss] PDB reader and bond perception
On Tue, Jan 14, 2014 at 11:48 AM, Greg Landrum greg.land...@gmail.comwrote: ok, it looks like something bad happened[1] when the PDB branch was merged into trunk before the last release. Here's an example that worked properly at the time of the UGM: In [5]: m =Chem.MolFromPDBFile('data/2FVD.pdb') In [6]: Chem.MolToSmiles(m,canonical=False) Out[6]: 'NC(C(O)NC(C ' Here's the notebook showing what's supposed to happen: http://nbviewer.ipython.org/github/rdkit/UGM_2013/blob/master/Notebooks/Whats_new.ipynb I'll look into this as soon as I can and get it fixed. I just tracked this down and fixed it. The changes are checked into github. Details about what happened are below. Here's my example from above now: In [3]: m = Chem.MolFromPDBFile('./2FVD.pdb') In [4]: Chem.MolToSmiles(m,canonical=False) Out[4]: 'NC(C(=O)NC(C(=O)NC(C(=O' This is, IMO, a major enough problem that it's worth doing a patch release to address it. Over the next few days, I will put together a list of fixes (not new features) that should be in the 2013_09_2 release and adjust the milestones for those issues. Please feel free to suggest additions. The list (currently empty) can be found here: https://github.com/rdkit/rdkit/issues?milestone=6 For those who care, here's how the bug came about. The bond-type assignment code for standard PDB residues tests bonded atoms to make sure they are in the same residue. This code compares the two atoms' AtomPDBResidueInfo structures. Shortly before the 2013_09_1 release, I added an explicit residueNumber property to the AtomPDBResidueInfo class and switched the serialNumber property (previously used to store the residueNumber) to capture the actual serial number of the atom. I forgot to update the residue comparison code to reflect this change, so the SamePDBResidue() function was returning false unless the two atoms were the same. silly mistake. Best, -greg -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB reader and bond perception
Thanks Greg! Much appreciated. - Jean-Paul Ebejer Early Stage Researcher On 15 January 2014 08:38, Greg Landrum greg.land...@gmail.com wrote: On Tue, Jan 14, 2014 at 11:48 AM, Greg Landrum greg.land...@gmail.comwrote: ok, it looks like something bad happened[1] when the PDB branch was merged into trunk before the last release. Here's an example that worked properly at the time of the UGM: In [5]: m =Chem.MolFromPDBFile('data/2FVD.pdb') In [6]: Chem.MolToSmiles(m,canonical=False) Out[6]: 'NC(C(O)NC(C ' Here's the notebook showing what's supposed to happen: http://nbviewer.ipython.org/github/rdkit/UGM_2013/blob/master/Notebooks/Whats_new.ipynb I'll look into this as soon as I can and get it fixed. I just tracked this down and fixed it. The changes are checked into github. Details about what happened are below. Here's my example from above now: In [3]: m = Chem.MolFromPDBFile('./2FVD.pdb') In [4]: Chem.MolToSmiles(m,canonical=False) Out[4]: 'NC(C(=O)NC(C(=O)NC(C(=O' This is, IMO, a major enough problem that it's worth doing a patch release to address it. Over the next few days, I will put together a list of fixes (not new features) that should be in the 2013_09_2 release and adjust the milestones for those issues. Please feel free to suggest additions. The list (currently empty) can be found here: https://github.com/rdkit/rdkit/issues?milestone=6 For those who care, here's how the bug came about. The bond-type assignment code for standard PDB residues tests bonded atoms to make sure they are in the same residue. This code compares the two atoms' AtomPDBResidueInfo structures. Shortly before the 2013_09_1 release, I added an explicit residueNumber property to the AtomPDBResidueInfo class and switched the serialNumber property (previously used to store the residueNumber) to capture the actual serial number of the atom. I forgot to update the residue comparison code to reflect this change, so the SamePDBResidue() function was returning false unless the two atoms were the same. silly mistake. Best, -greg -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB reader and bond perception
Hi JP, However I am unable to get bond orders for the protein side - am I doing something wrong or is this the intended behaviour ? I imagine I can use AssignBondOrdersFromTemplate() for the 20 amino acids and set these myself -- or is there a better way to do this? I don't know why your protein doesn't get bond orders, the PDBParser should know the standard amino acids. At least it worked for me when I tried Chem.MolFromPDB() in the past. Which PDB structure do you try to read? Also, is there a way to make AssignBondOrdersFromTemplate assign bond orders to all matches? The function was meant for assigning bonds based on an entire molecule. It would probably not be so difficult to change this (with default = match only one), if it is really needed. Also another thing I don't quite understand is in the following below code, I get a WARNING: More than one matching pattern found - picking one but how can my template match multiple times (this is not symettrical) ? The way the AssignBondOrdersFromTemplate() function works is the following: 1) a copy of the template is generated where all bonds are set to single bonds 2) this single-bonds copy is used for a substructure match with the query molecule 3) bond orders are assigned based on this match and the original template If you get this warning, it means that there is some symmetry in the all-single-bonds-stage of your molecule. In your case, I guess it's the carboxylic acids which can match two ways when there are only single bonds. I hope this helps. Best, Sereina On 13 January 2014 21:02, JP jeanpaul.ebe...@inhibox.com wrote: Thanks All - I think I am in a good place now. I can get the SMILES from Paul's mmcif links and then I can use Sereina magic three lines to do what I want. I'd cross my fingers - but with RDKit you don't need to. This works for all Chemical Components (or what other fashionable name they go by these days) in the PDB. For posterity: I have found a post in the mailing list started by James which sheds some light on this: https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03481.html On 13 January 2014 19:46, sereina riniker sereina.rini...@gmail.com wrote: Hi JP, If you have also a SMILES of the molecule you want to read from PDB, you can assign the bond orders based on this template: tmp = Chem.MolFromPDBFile(yourfilename) template = Chem.MolFromSmiles(yoursmiles) mol = AllChem.AssignBondOrdersFromTemplate(template, tmp) Is this what you're looking for? Best, Sereina 2014/1/13 JP jeanpaul.ebe...@inhibox.com RDKitters! Finally back on the mailing list! I am sure we've been through this at the UGM (my mind must have wandered off!), but a quick question about the PDB reader and bond perception. Is this supported with the current PDB reader? I remember that someone (PaulE, perhaps?) was saying bond perception was painful, but there was some dictionary for PDB ligands which helps (any idea the name of this dictionary?). To the technical details. I am reading in the following PDB file with a simple MolFromPDBFile() call: HETATM1 O1P 84T A1862 -27.016 9.387 -72.564 1.00 20.81 O HETATM2 P 84T A1862 -27.282 9.818 -73.968 1.00 19.65 P HETATM3 O2P 84T A1862 -27.881 11.176 -74.182 1.00 21.49 O HETATM4 N 84T A1862 -25.869 9.583 -74.813 1.00 19.78 N HETATM5 C 84T A1862 -25.759 10.010 -76.075 1.00 19.97 C HETATM6 CA 84T A1862 -24.493 9.748 -76.807 1.00 19.75 C HETATM7 CB 84T A1862 -24.794 8.678 -77.847 1.00 19.73 C HETATM8 CG 84T A1862 -23.571 8.324 -78.681 1.00 19.70 C HETATM9 CD2 84T A1862 -23.309 9.519 -79.611 1.00 18.49 C HETATM 10 CD1 84T A1862 -23.863 6.932 -79.305 1.00 18.60 C HETATM 11 OHB 84T A1862 -25.210 7.467 -77.223 1.00 19.17 O HETATM 12 OH 84T A1862 -23.549 9.127 -75.984 1.00 20.33 O HETATM 13 O 84T A1862 -26.672 10.517 -76.692 1.00 20.26 O HETATM 14 O5' 84T A1862 -28.377 8.861 -74.619 1.00 19.39 O HETATM 15 C5' 84T A1862 -28.002 7.536 -74.954 1.00 18.47 C HETATM 16 C4' 84T A1862 -28.909 7.000 -76.012 1.00 18.24 C HETATM 17 C3' 84T A1862 -28.901 7.826 -77.298 1.00 18.28 C HETATM 18 C2' 84T A1862 -30.318 7.610 -77.768 1.00 18.69 C HETATM 19 O2' 84T A1862 -30.789 8.641 -78.581 1.00 19.64 O HETATM 20 O4' 84T A1862 -30.262 6.951 -75.529 1.00 18.80 O HETATM 21 C1' 84T A1862 -31.152 7.470 -76.521 1.00 19.01 C HETATM 22 N9 84T A1862 -31.753 8.732 -76.009 1.00 20.08 N HETATM 23 C4 84T A1862 -33.033 9.013 -76.158 1.00 21.10 C HETATM 24 N3 84T
Re: [Rdkit-discuss] PDB reader and bond perception
Hi JP I use this file from PDB Europe: ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz Useful links followed from http://www.ebi.ac.uk/pdbe-srv/pdbechem/ The pdb.tar.gz file has the standard residues and LOTS of others with specific CONNECT records. TJ On Mon, Jan 13, 2014 at 9:54 AM, JP jeanpaul.ebe...@inhibox.com wrote: RDKitters! Finally back on the mailing list! I am sure we've been through this at the UGM (my mind must have wandered off!), but a quick question about the PDB reader and bond perception. Is this supported with the current PDB reader? I remember that someone (PaulE, perhaps?) was saying bond perception was painful, but there was some dictionary for PDB ligands which helps (any idea the name of this dictionary?). To the technical details. I am reading in the following PDB file with a simple MolFromPDBFile() call: HETATM1 O1P 84T A1862 -27.016 9.387 -72.564 1.00 20.81 O HETATM2 P 84T A1862 -27.282 9.818 -73.968 1.00 19.65 P HETATM3 O2P 84T A1862 -27.881 11.176 -74.182 1.00 21.49 O HETATM4 N 84T A1862 -25.869 9.583 -74.813 1.00 19.78 N HETATM5 C 84T A1862 -25.759 10.010 -76.075 1.00 19.97 C HETATM6 CA 84T A1862 -24.493 9.748 -76.807 1.00 19.75 C HETATM7 CB 84T A1862 -24.794 8.678 -77.847 1.00 19.73 C HETATM8 CG 84T A1862 -23.571 8.324 -78.681 1.00 19.70 C HETATM9 CD2 84T A1862 -23.309 9.519 -79.611 1.00 18.49 C HETATM 10 CD1 84T A1862 -23.863 6.932 -79.305 1.00 18.60 C HETATM 11 OHB 84T A1862 -25.210 7.467 -77.223 1.00 19.17 O HETATM 12 OH 84T A1862 -23.549 9.127 -75.984 1.00 20.33 O HETATM 13 O 84T A1862 -26.672 10.517 -76.692 1.00 20.26 O HETATM 14 O5' 84T A1862 -28.377 8.861 -74.619 1.00 19.39 O HETATM 15 C5' 84T A1862 -28.002 7.536 -74.954 1.00 18.47 C HETATM 16 C4' 84T A1862 -28.909 7.000 -76.012 1.00 18.24 C HETATM 17 C3' 84T A1862 -28.901 7.826 -77.298 1.00 18.28 C HETATM 18 C2' 84T A1862 -30.318 7.610 -77.768 1.00 18.69 C HETATM 19 O2' 84T A1862 -30.789 8.641 -78.581 1.00 19.64 O HETATM 20 O4' 84T A1862 -30.262 6.951 -75.529 1.00 18.80 O HETATM 21 C1' 84T A1862 -31.152 7.470 -76.521 1.00 19.01 C HETATM 22 N9 84T A1862 -31.753 8.732 -76.009 1.00 20.08 N HETATM 23 C4 84T A1862 -33.033 9.013 -76.158 1.00 21.10 C HETATM 24 N3 84T A1862 -34.018 8.339 -76.786 1.00 21.58 N HETATM 25 C2 84T A1862 -35.263 8.846 -76.830 1.00 21.95 C HETATM 26 C8 84T A1862 -31.223 9.701 -75.291 1.00 20.27 C HETATM 27 N7 84T A1862 -32.173 10.618 -75.019 1.00 21.28 N HETATM 28 C5 84T A1862 -33.315 10.213 -75.563 1.00 21.81 C HETATM 29 C6 84T A1862 -34.624 10.702 -75.627 1.00 22.85 C HETATM 30 N1 84T A1862 -35.550 10.010 -76.285 1.00 22.44 N HETATM 31 N6 84T A1862 -35.008 11.862 -75.052 1.00 23.86 N TER END But I am losing all the double bond (and aromatic) information: m = Chem.MolFromPDBFile(sys.argv[1]) print Chem.MolToSmiles(m) Gives me: CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1 As usual, many thanks for your time, - Jean-Paul Ebejer Early Stage Researcher -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB reader and bond perception
On 13/01/14 17:54, JP wrote: RDKitters! Finally back on the mailing list! I am sure we've been through this at the UGM (my mind must have wandered off!), but a quick question about the PDB reader and bond perception. Is this supported with the current PDB reader? I remember that someone (PaulE, perhaps?) was saying bond perception was painful, but there was some dictionary for PDB ligands which helps (any idea the name of this dictionary?). To the technical details. I am reading in the following PDB file with a simple MolFromPDBFile() call: HETATM1 O1P 84T A1862 -27.016 9.387 -72.564 1.00 20.81 O HETATM2 P 84T A1862 -27.282 9.818 -73.968 1.00 19.65 P HETATM3 O2P 84T A1862 -27.881 11.176 -74.182 1.00 21.49 O HETATM4 N 84T A1862 -25.869 9.583 -74.813 1.00 19.78 N HETATM5 C 84T A1862 -25.759 10.010 -76.075 1.00 19.97 C HETATM6 CA 84T A1862 -24.493 9.748 -76.807 1.00 19.75 C HETATM7 CB 84T A1862 -24.794 8.678 -77.847 1.00 19.73 C HETATM8 CG 84T A1862 -23.571 8.324 -78.681 1.00 19.70 C HETATM9 CD2 84T A1862 -23.309 9.519 -79.611 1.00 18.49 C HETATM 10 CD1 84T A1862 -23.863 6.932 -79.305 1.00 18.60 C HETATM 11 OHB 84T A1862 -25.210 7.467 -77.223 1.00 19.17 O HETATM 12 OH 84T A1862 -23.549 9.127 -75.984 1.00 20.33 O HETATM 13 O 84T A1862 -26.672 10.517 -76.692 1.00 20.26 O HETATM 14 O5' 84T A1862 -28.377 8.861 -74.619 1.00 19.39 O HETATM 15 C5' 84T A1862 -28.002 7.536 -74.954 1.00 18.47 C HETATM 16 C4' 84T A1862 -28.909 7.000 -76.012 1.00 18.24 C HETATM 17 C3' 84T A1862 -28.901 7.826 -77.298 1.00 18.28 C HETATM 18 C2' 84T A1862 -30.318 7.610 -77.768 1.00 18.69 C HETATM 19 O2' 84T A1862 -30.789 8.641 -78.581 1.00 19.64 O HETATM 20 O4' 84T A1862 -30.262 6.951 -75.529 1.00 18.80 O HETATM 21 C1' 84T A1862 -31.152 7.470 -76.521 1.00 19.01 C HETATM 22 N9 84T A1862 -31.753 8.732 -76.009 1.00 20.08 N HETATM 23 C4 84T A1862 -33.033 9.013 -76.158 1.00 21.10 C HETATM 24 N3 84T A1862 -34.018 8.339 -76.786 1.00 21.58 N HETATM 25 C2 84T A1862 -35.263 8.846 -76.830 1.00 21.95 C HETATM 26 C8 84T A1862 -31.223 9.701 -75.291 1.00 20.27 C HETATM 27 N7 84T A1862 -32.173 10.618 -75.019 1.00 21.28 N HETATM 28 C5 84T A1862 -33.315 10.213 -75.563 1.00 21.81 C HETATM 29 C6 84T A1862 -34.624 10.702 -75.627 1.00 22.85 C HETATM 30 N1 84T A1862 -35.550 10.010 -76.285 1.00 22.44 N HETATM 31 N6 84T A1862 -35.008 11.862 -75.052 1.00 23.86 N TER END But I am losing all the double bond (and aromatic) information: m = Chem.MolFromPDBFile(sys.argv[1]) print Chem.MolToSmiles(m) Gives me: CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1 As usual, many thanks for your time, 84T is a reference to chemical description: http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/84T This (mmcif) is what I parse, either from the local dictionary or downloading the file on the fly: ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/mmcif/84T.cif Does that help? Paul. -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB reader and bond perception
Hi JP, If you have also a SMILES of the molecule you want to read from PDB, you can assign the bond orders based on this template: tmp = Chem.MolFromPDBFile(yourfilename) template = Chem.MolFromSmiles(yoursmiles) mol = AllChem.AssignBondOrdersFromTemplate(template, tmp) Is this what you're looking for? Best, Sereina 2014/1/13 JP jeanpaul.ebe...@inhibox.com RDKitters! Finally back on the mailing list! I am sure we've been through this at the UGM (my mind must have wandered off!), but a quick question about the PDB reader and bond perception. Is this supported with the current PDB reader? I remember that someone (PaulE, perhaps?) was saying bond perception was painful, but there was some dictionary for PDB ligands which helps (any idea the name of this dictionary?). To the technical details. I am reading in the following PDB file with a simple MolFromPDBFile() call: HETATM1 O1P 84T A1862 -27.016 9.387 -72.564 1.00 20.81 O HETATM2 P 84T A1862 -27.282 9.818 -73.968 1.00 19.65 P HETATM3 O2P 84T A1862 -27.881 11.176 -74.182 1.00 21.49 O HETATM4 N 84T A1862 -25.869 9.583 -74.813 1.00 19.78 N HETATM5 C 84T A1862 -25.759 10.010 -76.075 1.00 19.97 C HETATM6 CA 84T A1862 -24.493 9.748 -76.807 1.00 19.75 C HETATM7 CB 84T A1862 -24.794 8.678 -77.847 1.00 19.73 C HETATM8 CG 84T A1862 -23.571 8.324 -78.681 1.00 19.70 C HETATM9 CD2 84T A1862 -23.309 9.519 -79.611 1.00 18.49 C HETATM 10 CD1 84T A1862 -23.863 6.932 -79.305 1.00 18.60 C HETATM 11 OHB 84T A1862 -25.210 7.467 -77.223 1.00 19.17 O HETATM 12 OH 84T A1862 -23.549 9.127 -75.984 1.00 20.33 O HETATM 13 O 84T A1862 -26.672 10.517 -76.692 1.00 20.26 O HETATM 14 O5' 84T A1862 -28.377 8.861 -74.619 1.00 19.39 O HETATM 15 C5' 84T A1862 -28.002 7.536 -74.954 1.00 18.47 C HETATM 16 C4' 84T A1862 -28.909 7.000 -76.012 1.00 18.24 C HETATM 17 C3' 84T A1862 -28.901 7.826 -77.298 1.00 18.28 C HETATM 18 C2' 84T A1862 -30.318 7.610 -77.768 1.00 18.69 C HETATM 19 O2' 84T A1862 -30.789 8.641 -78.581 1.00 19.64 O HETATM 20 O4' 84T A1862 -30.262 6.951 -75.529 1.00 18.80 O HETATM 21 C1' 84T A1862 -31.152 7.470 -76.521 1.00 19.01 C HETATM 22 N9 84T A1862 -31.753 8.732 -76.009 1.00 20.08 N HETATM 23 C4 84T A1862 -33.033 9.013 -76.158 1.00 21.10 C HETATM 24 N3 84T A1862 -34.018 8.339 -76.786 1.00 21.58 N HETATM 25 C2 84T A1862 -35.263 8.846 -76.830 1.00 21.95 C HETATM 26 C8 84T A1862 -31.223 9.701 -75.291 1.00 20.27 C HETATM 27 N7 84T A1862 -32.173 10.618 -75.019 1.00 21.28 N HETATM 28 C5 84T A1862 -33.315 10.213 -75.563 1.00 21.81 C HETATM 29 C6 84T A1862 -34.624 10.702 -75.627 1.00 22.85 C HETATM 30 N1 84T A1862 -35.550 10.010 -76.285 1.00 22.44 N HETATM 31 N6 84T A1862 -35.008 11.862 -75.052 1.00 23.86 N TER END But I am losing all the double bond (and aromatic) information: m = Chem.MolFromPDBFile(sys.argv[1]) print Chem.MolToSmiles(m) Gives me: CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1 As usual, many thanks for your time, - Jean-Paul Ebejer Early Stage Researcher -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB reader and bond perception
Thanks All - I think I am in a good place now. I can get the SMILES from Paul's mmcif links and then I can use Sereina magic three lines to do what I want. I'd cross my fingers - but with RDKit you don't need to. This works for all Chemical Components (or what other fashionable name they go by these days) in the PDB. For posterity: I have found a post in the mailing list started by James which sheds some light on this: https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03481.html On 13 January 2014 19:46, sereina riniker sereina.rini...@gmail.com wrote: Hi JP, If you have also a SMILES of the molecule you want to read from PDB, you can assign the bond orders based on this template: tmp = Chem.MolFromPDBFile(yourfilename) template = Chem.MolFromSmiles(yoursmiles) mol = AllChem.AssignBondOrdersFromTemplate(template, tmp) Is this what you're looking for? Best, Sereina 2014/1/13 JP jeanpaul.ebe...@inhibox.com RDKitters! Finally back on the mailing list! I am sure we've been through this at the UGM (my mind must have wandered off!), but a quick question about the PDB reader and bond perception. Is this supported with the current PDB reader? I remember that someone (PaulE, perhaps?) was saying bond perception was painful, but there was some dictionary for PDB ligands which helps (any idea the name of this dictionary?). To the technical details. I am reading in the following PDB file with a simple MolFromPDBFile() call: HETATM1 O1P 84T A1862 -27.016 9.387 -72.564 1.00 20.81 O HETATM2 P 84T A1862 -27.282 9.818 -73.968 1.00 19.65 P HETATM3 O2P 84T A1862 -27.881 11.176 -74.182 1.00 21.49 O HETATM4 N 84T A1862 -25.869 9.583 -74.813 1.00 19.78 N HETATM5 C 84T A1862 -25.759 10.010 -76.075 1.00 19.97 C HETATM6 CA 84T A1862 -24.493 9.748 -76.807 1.00 19.75 C HETATM7 CB 84T A1862 -24.794 8.678 -77.847 1.00 19.73 C HETATM8 CG 84T A1862 -23.571 8.324 -78.681 1.00 19.70 C HETATM9 CD2 84T A1862 -23.309 9.519 -79.611 1.00 18.49 C HETATM 10 CD1 84T A1862 -23.863 6.932 -79.305 1.00 18.60 C HETATM 11 OHB 84T A1862 -25.210 7.467 -77.223 1.00 19.17 O HETATM 12 OH 84T A1862 -23.549 9.127 -75.984 1.00 20.33 O HETATM 13 O 84T A1862 -26.672 10.517 -76.692 1.00 20.26 O HETATM 14 O5' 84T A1862 -28.377 8.861 -74.619 1.00 19.39 O HETATM 15 C5' 84T A1862 -28.002 7.536 -74.954 1.00 18.47 C HETATM 16 C4' 84T A1862 -28.909 7.000 -76.012 1.00 18.24 C HETATM 17 C3' 84T A1862 -28.901 7.826 -77.298 1.00 18.28 C HETATM 18 C2' 84T A1862 -30.318 7.610 -77.768 1.00 18.69 C HETATM 19 O2' 84T A1862 -30.789 8.641 -78.581 1.00 19.64 O HETATM 20 O4' 84T A1862 -30.262 6.951 -75.529 1.00 18.80 O HETATM 21 C1' 84T A1862 -31.152 7.470 -76.521 1.00 19.01 C HETATM 22 N9 84T A1862 -31.753 8.732 -76.009 1.00 20.08 N HETATM 23 C4 84T A1862 -33.033 9.013 -76.158 1.00 21.10 C HETATM 24 N3 84T A1862 -34.018 8.339 -76.786 1.00 21.58 N HETATM 25 C2 84T A1862 -35.263 8.846 -76.830 1.00 21.95 C HETATM 26 C8 84T A1862 -31.223 9.701 -75.291 1.00 20.27 C HETATM 27 N7 84T A1862 -32.173 10.618 -75.019 1.00 21.28 N HETATM 28 C5 84T A1862 -33.315 10.213 -75.563 1.00 21.81 C HETATM 29 C6 84T A1862 -34.624 10.702 -75.627 1.00 22.85 C HETATM 30 N1 84T A1862 -35.550 10.010 -76.285 1.00 22.44 N HETATM 31 N6 84T A1862 -35.008 11.862 -75.052 1.00 23.86 N TER END But I am losing all the double bond (and aromatic) information: m = Chem.MolFromPDBFile(sys.argv[1]) print Chem.MolToSmiles(m) Gives me: CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1 As usual, many thanks for your time, - Jean-Paul Ebejer Early Stage Researcher -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today.