[Rdkit-discuss] Mol does not produce readable smiles
Hi all, This is my first post on the rdkit mailing list, but I've been using it for a few months now (and think it's awesome by the way). I've found a slightly quirky behaviour. Rdkit can read in the below mol block but then the smiles it produces cannot be read in again. I think the problem is the lack of explicit hydrogen on the aromatic sulphur, leading to an inability to kekulize. I was wondering why this might occur? Thanks, Anthony Sim_1 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2sc1NS(=O)(=O)c1ccc(Cl)s1' # None mol - what rdkit outputs Smi_2 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2sc1=NS(=O)(=O)c1ccc(Cl)s1' # Not None Smi_3 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2[sH]c1NS(=O)(=O)c1ccc(Cl)s1' # Not None # Smi_2 and Smi_3 both hold a different oxidation state for the sulphur. According to the SDF it should be Smi_3. from rdkit import Chem sdf= probmol RDKit 3D 26 28 0 0 0 0 0 0 0 0999 V2000 40.2640 -47.5920 65.9800 N 0 0 0 0 0 0 0 0 0 0 0 0 41.1750 -46.7850 66.5750 C 0 0 0 0 0 0 0 0 0 0 0 0 41.4340 -46.7010 67.9480 C 0 0 0 0 0 0 0 0 0 0 0 0 42.4150 -45.8090 68.3990 C 0 0 0 0 0 0 0 0 0 0 0 0 43.1240 -45.0120 67.4860 C 0 0 0 0 0 0 0 0 0 0 0 0 42.8540 -45.1070 66.1140 C 0 0 0 0 0 0 0 0 0 0 0 0 41.8740 -45.9990 65.6750 C 0 0 0 0 0 0 0 0 0 0 0 0 41.4130 -46.2370 64.0380 S 0 0 0 0 0 0 0 0 0 0 0 0 40.2720 -47.4090 64.6280 C 0 0 0 0 0 0 0 0 0 0 0 0 39.4590 -48.0810 63.8510 N 0 0 0 0 0 0 0 0 0 0 0 0 39.3560 -48.5030 66.6990 C 0 0 0 0 0 0 0 0 0 0 0 0 39.9550 -49.8630 66.8630 C 0 0 0 0 0 0 0 0 0 0 0 0 40.2440 -50.3500 68.0660 C 0 0 0 0 0 0 0 0 0 0 0 0 39.3310 -47.9450 62.1280 S 0 0 0 0 0 0 0 0 0 0 0 0 40.7120 -48.0440 61.5180 O 0 0 0 0 0 0 0 0 0 0 0 0 38.4830 -49.0830 61.6020 O 0 0 0 0 0 0 0 0 0 0 0 0 38.5560 -46.4150 61.7050 C 0 0 0 0 0 0 0 0 0 0 0 0 39.4690 -45.0310 61.2360 S 0 0 0 0 0 0 0 0 0 0 0 0 37.9620 -44.2200 61.0510 C 0 0 0 0 0 0 0 0 0 0 0 0 36.8440 -44.9740 61.3330 C 0 0 0 0 0 0 0 0 0 0 0 0 37.8570 -42.5080 60.5230 Cl 0 0 0 0 0 0 0 0 0 0 0 0 37.1890 -46.2470 61.7120 C 0 0 0 0 0 0 0 0 0 0 0 0 44.3650 -43.8910 68.0560 S 0 0 0 0 0 0 0 0 0 0 0 0 45.0700 -44.4650 69.2670 O 0 0 0 0 0 0 0 0 0 0 0 0 45.3810 -43.6550 66.9600 O 0 0 0 0 0 0 0 0 0 0 0 0 43.6310 -42.3950 68.5100 N 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 1 11 1 0 2 3 2 0 3 4 1 0 5 23 1 0 5 4 2 0 6 5 1 0 7 6 2 0 7 2 1 0 8 9 2 0 8 7 1 0 9 1 1 0 10 9 1 0 11 12 1 0 12 13 2 0 14 10 1 0 15 14 2 0 16 14 2 0 17 22 2 0 17 14 1 0 18 17 1 0 19 18 1 0 19 20 2 0 20 22 1 0 21 19 1 0 23 26 1 0 23 24 2 0 25 23 2 0 M END mol = Chem.MolFromMolBlock(sdf) mol is None # Gives false # Then convert to smiles and back smimol = Chem.MolFromSmiles(Chem.MolToSmiles(mol)) smimol is None # Gives true -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Mol does not produce readable smiles
Hi Anthony, On Fri, Jul 26, 2013 at 4:56 AM, Anthony Bradley anthony.brad...@worc.ox.ac.uk wrote: Hi all, ** ** This is my first post on the rdkit mailing list, but I’ve been using it for a few months now (and think it’s awesome by the way). ** Welcome! and thanks! ** I’ve found a slightly quirky behaviour. ** ** Rdkit can read in the below mol block but then the smiles it produces cannot be read in again. I can reproduce this. Thanks for reporting it. ** ** I think the problem is the lack of explicit hydrogen on the aromatic sulphur, leading to an inability to kekulize. ** ** I was wondering why this might occur? ** I think there's actually an error in the input structure. It looks like there should be a charge somewhere in the 5 ring. In any case, what the RDKit is currently doing is wrong. It should either fail on the input structure or produce a SMILES that can be read back in. I'll file a bug for it. -greg ** Thanks, ** ** Anthony ** ** Sim_1 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2sc1NS(=O)(=O)c1ccc(Cl)s1' # None mol - what rdkit outputs ** ** Smi_2 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2sc1=NS(=O)(=O)c1ccc(Cl)s1' # Not None * *** ** ** Smi_3 = ‘C=CCn1c2ccc(S(N)(=O)=O)cc2[sH]c1NS(=O)(=O)c1ccc(Cl)s1' # Not None ** ** # Smi_2 and Smi_3 both hold a different oxidation state for the sulphur. According to the SDF it should be Smi_3. ** ** from rdkit import Chem ** ** sdf= probmol RDKit 3D ** ** 26 28 0 0 0 0 0 0 0 0999 V2000 40.2640 -47.5920 65.9800 N 0 0 0 0 0 0 0 0 0 0 0 0 41.1750 -46.7850 66.5750 C 0 0 0 0 0 0 0 0 0 0 0 0 41.4340 -46.7010 67.9480 C 0 0 0 0 0 0 0 0 0 0 0 0 42.4150 -45.8090 68.3990 C 0 0 0 0 0 0 0 0 0 0 0 0 43.1240 -45.0120 67.4860 C 0 0 0 0 0 0 0 0 0 0 0 0 42.8540 -45.1070 66.1140 C 0 0 0 0 0 0 0 0 0 0 0 0 41.8740 -45.9990 65.6750 C 0 0 0 0 0 0 0 0 0 0 0 0 41.4130 -46.2370 64.0380 S 0 0 0 0 0 0 0 0 0 0 0 0 40.2720 -47.4090 64.6280 C 0 0 0 0 0 0 0 0 0 0 0 0 39.4590 -48.0810 63.8510 N 0 0 0 0 0 0 0 0 0 0 0 0 39.3560 -48.5030 66.6990 C 0 0 0 0 0 0 0 0 0 0 0 0 39.9550 -49.8630 66.8630 C 0 0 0 0 0 0 0 0 0 0 0 0 40.2440 -50.3500 68.0660 C 0 0 0 0 0 0 0 0 0 0 0 0 39.3310 -47.9450 62.1280 S 0 0 0 0 0 0 0 0 0 0 0 0 40.7120 -48.0440 61.5180 O 0 0 0 0 0 0 0 0 0 0 0 0 38.4830 -49.0830 61.6020 O 0 0 0 0 0 0 0 0 0 0 0 0 38.5560 -46.4150 61.7050 C 0 0 0 0 0 0 0 0 0 0 0 0 39.4690 -45.0310 61.2360 S 0 0 0 0 0 0 0 0 0 0 0 0 37.9620 -44.2200 61.0510 C 0 0 0 0 0 0 0 0 0 0 0 0 36.8440 -44.9740 61.3330 C 0 0 0 0 0 0 0 0 0 0 0 0 37.8570 -42.5080 60.5230 Cl 0 0 0 0 0 0 0 0 0 0 0 0 37.1890 -46.2470 61.7120 C 0 0 0 0 0 0 0 0 0 0 0 0 44.3650 -43.8910 68.0560 S 0 0 0 0 0 0 0 0 0 0 0 0 45.0700 -44.4650 69.2670 O 0 0 0 0 0 0 0 0 0 0 0 0 45.3810 -43.6550 66.9600 O 0 0 0 0 0 0 0 0 0 0 0 0 43.6310 -42.3950 68.5100 N 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 1 11 1 0 2 3 2 0 3 4 1 0 5 23 1 0 5 4 2 0 6 5 1 0 7 6 2 0 7 2 1 0 8 9 2 0 8 7 1 0 9 1 1 0 10 9 1 0 11 12 1 0 12 13 2 0 14 10 1 0 15 14 2 0 16 14 2 0 17 22 2 0 17 14 1 0 18 17 1 0 19 18 1 0 19 20 2 0 20 22 1 0 21 19 1 0 23 26 1 0 23 24 2 0 25 23 2 0 M END ** ** mol = Chem.MolFromMolBlock(sdf) ** ** mol is None ** ** # Gives false ** ** # Then convert to smiles and back smimol = Chem.MolFromSmiles(Chem.MolToSmiles(mol)) ** ** smimol is None ** ** # Gives true ** ** ** ** -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] [RDKit-Discuss]: Aromatic Heavy Atoms
Dear RDKiters, I'm creating a descriptor for estimating water solubility (clogSw) base on the following article of Delaney (doi:10.1021/ci034243x). J. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular Structure,” *Journal of Chemical Information and Modeling*, vol. 44, no. 3, pp. 1000–1005, May 2004. In this paper he proposes an equation to calculate an estimation of the water solubility of molecules based on physio-chemical descriptors. One of the descriptors used is Aromatic Proportion, that is the proportion of heavy atoms of the molecule that are in aromatic ring. So in order to find the aromatic heavy atoms I use GetSubstructMatches(...) with query SMARTS '[a]'. Is that the correct way to find all the aromatic atoms of a molecule? If not what is the correct SMARTS to use? @Greg: When I complete this, can we look into adding it as a new descriptor, clogSw (like clogP), within the RDKit distribution? Kind Regards, Christos -- Christos Kannas Researcher Ph.D Student e-Health Laboratory http://www.medinfo.cs.ucy.ac.cy/ kannas.chris...@ucy.ac.cy kannas.chris...@cs.ucy.ac.cy chriskan...@gmail.com Mob: (+357) 99530608 -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [RDKit-Discuss]: Aromatic Heavy Atoms
Hi Christos, On Friday, July 26, 2013, Christos Kannas wrote: One of the descriptors used is Aromatic Proportion, that is the proportion of heavy atoms of the molecule that are in aromatic ring. So in order to find the aromatic heavy atoms I use GetSubstructMatches(...) with query SMARTS '[a]'. Is that the correct way to find all the aromatic atoms of a molecule? If not what is the correct SMARTS to use? That's the correct SMARTS. There may already be a function that calculates the number of aromatic atoms (I am on my phone and can't check); take a look in rdkit.Chem.rdMolDescriptors. If nothing is there already and you are working from python, using the smarts matcher as you propose is probably the best way. @Greg: When I complete this, can we look into adding it as a new descriptor, clogSw (like clogP), within the RDKit distribution? I can't think of an argument against it; I would be happy to take a look at a pull request once you have it ready to go. -greg -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss