Re: [Rdkit-discuss] kekulizing carbazole
On 31/10/10 14:18, Greg Landrum wrote: Hi Paul, On Sun, Oct 31, 2010 at 12:09 PM, Paul Emsley paul.ems...@bioch.ox.ac.uk wrote: I'm running into problems when I try to kekulize carbazole. The description I start with is that all the bonds are marked as Bond::AROMATIC and I do setIsAromatic(true) on all the atoms (which are all non-hydrogens). The explicitValence() for the N is 3. MolOps::Kekulize() fails in that case, Can't kekulize mol. If I add single bonds to hydrogens (including a hydrogens on the N) then MolOps::Kekulize() works. So my question is, how should I adjust the molecule description in the first case so that MolOps::Kekulize() works without hydrogens too? The problem is probably the lack of an explicit H on the nitrogen atom. It's easily demonstrated with pyrrole: [2] m=Chem.MolFromSmiles('c1cccn1') [15:10:05] Can't kekulize mol You can fix this by letting the RDKit know that there's an H on the N atom: [3] m=Chem.MolFromSmiles('c1ccc[nH]1') [4] Carbazole is the same story: [4] m=Chem.MolFromSmiles('c1ccc2c(c1)[nH]c1c21') [5] Note that in either case if you provide the structure in its Kekule form this doesn't happen, here's the illustration for pyrrole: [5] m=Chem.MolFromSmiles('C1=CC=CN1') [6] Chem.MolToSmiles(m) Out[6] 'c1cc[nH]c1' There's an argument to be made that the Kekulization code could be made more robust with respect to this particular edge case, but to this point the effort involved has not seem justified by the payoff: most of the time the H is present in the SMILES, so this problem doesn't occur. Hi Greg, Thanks for your informative and speedy reply. For the record, I would like to describe how I proceeded in the light of your reply. My starting point to construct an RWMol is an mmCIF restraints file. As well as containing description of the bonds and angles (etc.) this file describes the atoms, part of the description of which is the type_energy. Pyrrole and carbazole Ns (for example) have the type NR15(energy types are listed in energy_lib.cif [1]) so now when I see an atom of that type [2], I add an extra H bonded to the N and everything is then peachy. Thanks again, Paul. [1] http://www.ccp4.ac.uk/ccp4bin/viewcvs/ccp4/lib/data/monomers/ener_lib.cif [2] there are other cases that I need to handle -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] kekulizing carbazole
Hi Paul, On Mon, Nov 1, 2010 at 3:54 PM, Paul Emsley paul.ems...@bioch.ox.ac.uk wrote: For the record, I would like to describe how I proceeded in the light of your reply. My starting point to construct an RWMol is an mmCIF restraints file. As well as containing description of the bonds and angles (etc.) this file describes the atoms, part of the description of which is the type_energy. Pyrrole and carbazole Ns (for example) have the type NR15(energy types are listed in energy_lib.cif [1]) so now when I see an atom of that type [2], I add an extra H bonded to the N and everything is then peachy. What you are describing sounds correct. One possible, minor, optimization if you are building the molecule atom by atom (as opposed to reading it from a mol block): you don't actually need to put the H in the graph. You can instead do something like the following: [6] m = Chem.MolFromSmiles('c1cccn1',sanitize=False) [7] m.GetAtomWithIdx(4).SetNumExplicitHs(1) [8] Chem.SanitizeMol(m) [9] print Chem.MolToSmiles(m) - print(Chem.MolToSmiles(m)) c1cc[nH]c1 Thanks again, you're welcome! Best Regards, -greg -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] kekulizing carbazole
Hi, I'm running into problems when I try to kekulize carbazole. The description I start with is that all the bonds are marked as Bond::AROMATIC and I do setIsAromatic(true) on all the atoms (which are all non-hydrogens). The explicitValence() for the N is 3. MolOps::Kekulize() fails in that case, Can't kekulize mol. If I add single bonds to hydrogens (including a hydrogens on the N) then MolOps::Kekulize() works. So my question is, how should I adjust the molecule description in the first case so that MolOps::Kekulize() works without hydrogens too? Thanks, Paul. -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] kekulizing carbazole
Hi Paul, On Sun, Oct 31, 2010 at 12:09 PM, Paul Emsley paul.ems...@bioch.ox.ac.uk wrote: I'm running into problems when I try to kekulize carbazole. The description I start with is that all the bonds are marked as Bond::AROMATIC and I do setIsAromatic(true) on all the atoms (which are all non-hydrogens). The explicitValence() for the N is 3. MolOps::Kekulize() fails in that case, Can't kekulize mol. If I add single bonds to hydrogens (including a hydrogens on the N) then MolOps::Kekulize() works. So my question is, how should I adjust the molecule description in the first case so that MolOps::Kekulize() works without hydrogens too? The problem is probably the lack of an explicit H on the nitrogen atom. It's easily demonstrated with pyrrole: [2] m=Chem.MolFromSmiles('c1cccn1') [15:10:05] Can't kekulize mol You can fix this by letting the RDKit know that there's an H on the N atom: [3] m=Chem.MolFromSmiles('c1ccc[nH]1') [4] Carbazole is the same story: [4] m=Chem.MolFromSmiles('c1ccc2c(c1)[nH]c1c21') [5] Note that in either case if you provide the structure in its Kekule form this doesn't happen, here's the illustration for pyrrole: [5] m=Chem.MolFromSmiles('C1=CC=CN1') [6] Chem.MolToSmiles(m) Out[6] 'c1cc[nH]c1' There's an argument to be made that the Kekulization code could be made more robust with respect to this particular edge case, but to this point the effort involved has not seem justified by the payoff: most of the time the H is present in the SMILES, so this problem doesn't occur. Best Regards, -greg -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss