Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record
Forgot to include the mailing list. doh! - Jean-Paul Ebejer Early Stage Researcher On 9 July 2012 09:23, JP jeanpaul.ebe...@inhibox.com wrote: On 6 July 2012 21:07, Paul Emsley paul.ems...@bioch.ox.ac.uk wrote: On 06/07/12 10:27, JP wrote: The SMILES is in fact, a wget call away - since I am processing PDB deposited structures. http://www.rcsb.org/pdb/rest/describeHet?chemicalID=NAG It is not clear to me how the SMILES helps you. You still have to map between rdkit atoms and PDB atom names, do you not? How about using the monomer library? I was planning to do an rdkit SubStruct match between the PDB rdkit ligand and the smiles-generated rdkit one. Then I could use the atom id mappings between the two. Not the most performant of solutions... http://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/refmac/Dictionary/dictionary.html Interesting ... thanks. For the protein I can somehow guess the bonds. What do you mean by guess here? Are you worried about histidine protonation? I should be, of course. And about the correctness of the ligands in the PDB file (as Adrian pointed out earlier), and ... Perhaps this approach is fraught with problems... and needs some serious rethinking. -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record
Hi JP, Not sure if this is of any help. If it's an pdb file from rcsb or an in-house one where you have a corresponding smiles available maybe you could use this information to properly setup the bond types using bond matches? I know the components.cif file still has quite a few errors - however, maybe it could be of help. Cheers Nik Maybe you could use the smiles information ion On 7/6/12 5:04 AM, Greg Landrum greg.land...@gmail.com wrote: Hi JP, On Thu, Jul 5, 2012 at 7:07 PM, JP jeanpaul.ebe...@inhibox.com wrote: I generate a RWMol instance from the HETATM portion of a PDB file. My atoms are currently only joined by a single bond as defined in the connect portion of the pdb file, e.g. CONECT 2235 2234 2236 CONECT 2236 2231 2235 2251 CONECT 2237 2238 2242 ah, yes, the missing bond orders, one of the many reasons that I have never done a PDB parser for the RDKit. :-S I think you're doing this work in C++, so I'm going to answer the rest of the questions accordingly. Are there any obvious rdkit ways how to detect :- 0. rings Sure. If you just want to know if each atom/ring is in a ring you can use MolOps::fastFindRings and then mol.getRingInfo().numAtomRings(idx)0 or mol.getRingInfo().numBondRings(idx)0 If you want to know what the SSSR rings are, then you should use MolOps::symmetrizeSSSR(). You can pass that an extra argument where it will return the rings as defined by atom indices. After calling this, you can also get the set of atom rings using mol.getRingInfo().atomRings() or the bond rings with mol.getRingInfo().bondRings(); 1. aromatic rings/atoms 2. double/triple bonds 3. charges (if any) Here's where the trouble starts. I guess you want to perceive the bond types and atom hybridizations from the geometry. From there you can get the charges. The RDKit does not currently have anything to do this. There was a discussion on the mailing list last year: http://comments.gmane.org/gmane.science.chemistry.rdkit.user/85 where Geoff Hutchinson very kindly offered to donate the OpenBabel bond perception code to the RDKit. He sent the code, but I've never had the time to port it from OpenBabel to RDKit. If you're interested in implementing this and were willing to do it in a way that could be integrated into the main RDKit, I can send you the donated code; it's about 300 lines of well-commented C++. I would like to set these properties on every atom instance contained in my RWMol - so I generate a correct molecule representation. I assume sanitize would not clean these up for me? Correct? Correct. Sanitize uses the bond information that's there. -greg -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record
Steifl -- that is a really crafty and less painful way how to go about it. The SMILES is in fact, a wget call away - since I am processing PDB deposited structures. http://www.rcsb.org/pdb/rest/describeHet?chemicalID=NAG Of course, this approach is not generally applicable (and for this you'd need the perception code Greg mentioned). Can I have a look at that code, just to see the amount of work involved -- if nothing else? For the protein I can somehow guess the bonds. SanitizeMol takes around 20 mins on my machine (I assume this is because of the massive size of the molecule ~ 2000 atoms). - Jean-Paul Ebejer Early Stage Researcher On 6 July 2012 07:33, Stiefl, Nikolaus nikolaus.sti...@novartis.com wrote: Hi JP, Not sure if this is of any help. If it's an pdb file from rcsb or an in-house one where you have a corresponding smiles available maybe you could use this information to properly setup the bond types using bond matches? I know the components.cif file still has quite a few errors - however, maybe it could be of help. Cheers Nik Maybe you could use the smiles information ion On 7/6/12 5:04 AM, Greg Landrum greg.land...@gmail.com wrote: Hi JP, On Thu, Jul 5, 2012 at 7:07 PM, JP jeanpaul.ebe...@inhibox.com wrote: I generate a RWMol instance from the HETATM portion of a PDB file. My atoms are currently only joined by a single bond as defined in the connect portion of the pdb file, e.g. CONECT 2235 2234 2236 CONECT 2236 2231 2235 2251 CONECT 2237 2238 2242 ah, yes, the missing bond orders, one of the many reasons that I have never done a PDB parser for the RDKit. :-S I think you're doing this work in C++, so I'm going to answer the rest of the questions accordingly. Are there any obvious rdkit ways how to detect :- 0. rings Sure. If you just want to know if each atom/ring is in a ring you can use MolOps::fastFindRings and then mol.getRingInfo().numAtomRings(idx)0 or mol.getRingInfo().numBondRings(idx)0 If you want to know what the SSSR rings are, then you should use MolOps::symmetrizeSSSR(). You can pass that an extra argument where it will return the rings as defined by atom indices. After calling this, you can also get the set of atom rings using mol.getRingInfo().atomRings() or the bond rings with mol.getRingInfo().bondRings(); 1. aromatic rings/atoms 2. double/triple bonds 3. charges (if any) Here's where the trouble starts. I guess you want to perceive the bond types and atom hybridizations from the geometry. From there you can get the charges. The RDKit does not currently have anything to do this. There was a discussion on the mailing list last year: http://comments.gmane.org/gmane.science.chemistry.rdkit.user/85 where Geoff Hutchinson very kindly offered to donate the OpenBabel bond perception code to the RDKit. He sent the code, but I've never had the time to port it from OpenBabel to RDKit. If you're interested in implementing this and were willing to do it in a way that could be integrated into the main RDKit, I can send you the donated code; it's about 300 lines of well-commented C++. I would like to set these properties on every atom instance contained in my RWMol - so I generate a correct molecule representation. I assume sanitize would not clean these up for me? Correct? Correct. Sanitize uses the bond information that's there. -greg -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record
On Fri, Jul 6, 2012 at 10:27 AM, JP jeanpaul.ebe...@inhibox.com wrote: Steifl -- that is a really crafty and less painful way how to go about it. The SMILES is in fact, a wget call away - since I am processing PDB deposited structures. http://www.rcsb.org/pdb/rest/describeHet?chemicalID=NAG Do you want ideal ligand structures or those from the experimental PDB file? The ideal ones you can just download from PDBeChem (SDF). The Credo web service has a REST resource for downloading ligand structures in different formats (including SDF) - hopefully I will have a web server soon to make this publicly available. Of course, this approach is not generally applicable (and for this you'd need the perception code Greg mentioned). Can I have a look at that code, just to see the amount of work involved -- if nothing else? Just remember that those ligand structures are often far from perfect - the most common problems are missing atoms and dodgy geometries that affect bond order perception. Here are the two ligands (FRG) from PDB entry 1M48 for example: COC(=O)[C@H](Cc1ccc(cc1)C#Cc2c2)NC(=O)C[C@H]3CCCN(C3)C(=[NH2+])N COC(=O)[C@H](Cc1ccc(cc1)C#Cc2c2)NC(=O)C[C@H]3CC=CN(C3)C(=[NH2+])N The second one has a double bond because one of the rings does not have the same geometry as it does in the other ligand. For the protein I can somehow guess the bonds. SanitizeMol takes around 20 mins on my machine (I assume this is because of the massive size of the molecule ~ 2000 atoms). Jean-Paul Ebejer Early Stage Researcher It might be easier to use Open Babel to load the PDB structure and extract just the binding site, which you could safe in a different format and use afterwards in RDKit. On 6 July 2012 07:33, Stiefl, Nikolaus nikolaus.sti...@novartis.com wrote: Hi JP, Not sure if this is of any help. If it's an pdb file from rcsb or an in-house one where you have a corresponding smiles available maybe you could use this information to properly setup the bond types using bond matches? I know the components.cif file still has quite a few errors - however, maybe it could be of help. Cheers Nik Maybe you could use the smiles information ion On 7/6/12 5:04 AM, Greg Landrum greg.land...@gmail.com wrote: Hi JP, On Thu, Jul 5, 2012 at 7:07 PM, JP jeanpaul.ebe...@inhibox.com wrote: I generate a RWMol instance from the HETATM portion of a PDB file. My atoms are currently only joined by a single bond as defined in the connect portion of the pdb file, e.g. CONECT 2235 2234 2236 CONECT 2236 2231 2235 2251 CONECT 2237 2238 2242 ah, yes, the missing bond orders, one of the many reasons that I have never done a PDB parser for the RDKit. :-S I think you're doing this work in C++, so I'm going to answer the rest of the questions accordingly. Are there any obvious rdkit ways how to detect :- 0. rings Sure. If you just want to know if each atom/ring is in a ring you can use MolOps::fastFindRings and then mol.getRingInfo().numAtomRings(idx)0 or mol.getRingInfo().numBondRings(idx)0 If you want to know what the SSSR rings are, then you should use MolOps::symmetrizeSSSR(). You can pass that an extra argument where it will return the rings as defined by atom indices. After calling this, you can also get the set of atom rings using mol.getRingInfo().atomRings() or the bond rings with mol.getRingInfo().bondRings(); 1. aromatic rings/atoms 2. double/triple bonds 3. charges (if any) Here's where the trouble starts. I guess you want to perceive the bond types and atom hybridizations from the geometry. From there you can get the charges. The RDKit does not currently have anything to do this. There was a discussion on the mailing list last year: http://comments.gmane.org/gmane.science.chemistry.rdkit.user/85 where Geoff Hutchinson very kindly offered to donate the OpenBabel bond perception code to the RDKit. He sent the code, but I've never had the time to port it from OpenBabel to RDKit. If you're interested in implementing this and were willing to do it in a way that could be integrated into the main RDKit, I can send you the donated code; it's about 300 lines of well-commented C++. I would like to set these properties on every atom instance contained in my RWMol - so I generate a correct molecule representation. I assume sanitize would not clean these up for me? Correct? Correct. Sanitize uses the bond information that's there. -greg -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list
Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record
On 06/07/12 10:27, JP wrote: Steifl -- that is a really crafty and less painful way how to go about it. The SMILES is in fact, a wget call away - since I am processing PDB deposited structures. http://www.rcsb.org/pdb/rest/describeHet?chemicalID=NAG It is not clear to me how the SMILES helps you. You still have to map between rdkit atoms and PDB atom names, do you not? How about using the monomer library? http://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/refmac/Dictionary/dictionary.html Of course, this approach is not generally applicable (and for this you'd need the perception code Greg mentioned). Hmm... I am unconvinced that you want to be doing chemistry perception. For the protein I can somehow guess the bonds. What do you mean by guess here? Are you worried about histidine protonation? -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record
Hi JP, On Thu, Jul 5, 2012 at 7:07 PM, JP jeanpaul.ebe...@inhibox.com wrote: I generate a RWMol instance from the HETATM portion of a PDB file. My atoms are currently only joined by a single bond as defined in the connect portion of the pdb file, e.g. CONECT 2235 2234 2236 CONECT 2236 2231 2235 2251 CONECT 2237 2238 2242 ah, yes, the missing bond orders, one of the many reasons that I have never done a PDB parser for the RDKit. :-S I think you're doing this work in C++, so I'm going to answer the rest of the questions accordingly. Are there any obvious rdkit ways how to detect :- 0. rings Sure. If you just want to know if each atom/ring is in a ring you can use MolOps::fastFindRings and then mol.getRingInfo().numAtomRings(idx)0 or mol.getRingInfo().numBondRings(idx)0 If you want to know what the SSSR rings are, then you should use MolOps::symmetrizeSSSR(). You can pass that an extra argument where it will return the rings as defined by atom indices. After calling this, you can also get the set of atom rings using mol.getRingInfo().atomRings() or the bond rings with mol.getRingInfo().bondRings(); 1. aromatic rings/atoms 2. double/triple bonds 3. charges (if any) Here's where the trouble starts. I guess you want to perceive the bond types and atom hybridizations from the geometry. From there you can get the charges. The RDKit does not currently have anything to do this. There was a discussion on the mailing list last year: http://comments.gmane.org/gmane.science.chemistry.rdkit.user/85 where Geoff Hutchinson very kindly offered to donate the OpenBabel bond perception code to the RDKit. He sent the code, but I've never had the time to port it from OpenBabel to RDKit. If you're interested in implementing this and were willing to do it in a way that could be integrated into the main RDKit, I can send you the donated code; it's about 300 lines of well-commented C++. I would like to set these properties on every atom instance contained in my RWMol - so I generate a correct molecule representation. I assume sanitize would not clean these up for me? Correct? Correct. Sanitize uses the bond information that's there. -greg -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss