[Rdkit-discuss] Mol does not produce readable smiles

2013-07-26 Thread Anthony Bradley
Hi all,

This is my first post on the rdkit mailing list, but I've been using it for a 
few months now (and think it's awesome by the way).

I've found a slightly quirky behaviour.

Rdkit can read in the below mol block but then the smiles it produces cannot be 
read in again.

I think the problem is the lack of explicit hydrogen on the aromatic sulphur, 
leading to an inability to kekulize.

I was wondering why this might occur?

Thanks,

Anthony

Sim_1 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2sc1NS(=O)(=O)c1ccc(Cl)s1' # None mol - what 
rdkit outputs

Smi_2 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2sc1=NS(=O)(=O)c1ccc(Cl)s1' # Not None

Smi_3 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2[sH]c1NS(=O)(=O)c1ccc(Cl)s1' # Not None

# Smi_2 and Smi_3 both hold a different oxidation state for the sulphur. 
According to the SDF it should be Smi_3.

from rdkit import Chem

sdf= probmol
 RDKit  3D

26 28  0  0  0  0  0  0  0  0999 V2000
   40.2640  -47.5920   65.9800 N   0  0  0  0  0  0  0  0  0  0  0  0
   41.1750  -46.7850   66.5750 C   0  0  0  0  0  0  0  0  0  0  0  0
   41.4340  -46.7010   67.9480 C   0  0  0  0  0  0  0  0  0  0  0  0
   42.4150  -45.8090   68.3990 C   0  0  0  0  0  0  0  0  0  0  0  0
   43.1240  -45.0120   67.4860 C   0  0  0  0  0  0  0  0  0  0  0  0
   42.8540  -45.1070   66.1140 C   0  0  0  0  0  0  0  0  0  0  0  0
   41.8740  -45.9990   65.6750 C   0  0  0  0  0  0  0  0  0  0  0  0
   41.4130  -46.2370   64.0380 S   0  0  0  0  0  0  0  0  0  0  0  0
   40.2720  -47.4090   64.6280 C   0  0  0  0  0  0  0  0  0  0  0  0
   39.4590  -48.0810   63.8510 N   0  0  0  0  0  0  0  0  0  0  0  0
   39.3560  -48.5030   66.6990 C   0  0  0  0  0  0  0  0  0  0  0  0
   39.9550  -49.8630   66.8630 C   0  0  0  0  0  0  0  0  0  0  0  0
   40.2440  -50.3500   68.0660 C   0  0  0  0  0  0  0  0  0  0  0  0
   39.3310  -47.9450   62.1280 S   0  0  0  0  0  0  0  0  0  0  0  0
   40.7120  -48.0440   61.5180 O   0  0  0  0  0  0  0  0  0  0  0  0
   38.4830  -49.0830   61.6020 O   0  0  0  0  0  0  0  0  0  0  0  0
   38.5560  -46.4150   61.7050 C   0  0  0  0  0  0  0  0  0  0  0  0
   39.4690  -45.0310   61.2360 S   0  0  0  0  0  0  0  0  0  0  0  0
   37.9620  -44.2200   61.0510 C   0  0  0  0  0  0  0  0  0  0  0  0
   36.8440  -44.9740   61.3330 C   0  0  0  0  0  0  0  0  0  0  0  0
   37.8570  -42.5080   60.5230 Cl  0  0  0  0  0  0  0  0  0  0  0  0
   37.1890  -46.2470   61.7120 C   0  0  0  0  0  0  0  0  0  0  0  0
   44.3650  -43.8910   68.0560 S   0  0  0  0  0  0  0  0  0  0  0  0
   45.0700  -44.4650   69.2670 O   0  0  0  0  0  0  0  0  0  0  0  0
   45.3810  -43.6550   66.9600 O   0  0  0  0  0  0  0  0  0  0  0  0
   43.6310  -42.3950   68.5100 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  1 11  1  0
  2  3  2  0
  3  4  1  0
  5 23  1  0
  5  4  2  0
  6  5  1  0
  7  6  2  0
  7  2  1  0
  8  9  2  0
  8  7  1  0
  9  1  1  0
10  9  1  0
11 12  1  0
12 13  2  0
14 10  1  0
15 14  2  0
16 14  2  0
17 22  2  0
17 14  1  0
18 17  1  0
19 18  1  0
19 20  2  0
20 22  1  0
21 19  1  0
23 26  1  0
23 24  2  0
25 23  2  0
M  END

mol = Chem.MolFromMolBlock(sdf)

mol is None

# Gives false

# Then convert to smiles and back
smimol = Chem.MolFromSmiles(Chem.MolToSmiles(mol))

smimol is None

# Gives true


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Mol does not produce readable smiles

2013-07-26 Thread Greg Landrum
Hi Anthony,


On Fri, Jul 26, 2013 at 4:56 AM, Anthony Bradley 
anthony.brad...@worc.ox.ac.uk wrote:

  Hi all, 

 ** **

 This is my first post on the rdkit mailing list, but I’ve been using it
 for a few months now (and think it’s awesome by the way).

 **


Welcome! and thanks!


  **

 I’ve found a slightly quirky behaviour.

 ** **

 Rdkit can read in the below mol block but then the smiles it produces
 cannot be read in again.


I can reproduce this. Thanks for reporting it.


 

 ** **

 I think the problem is the lack of explicit hydrogen on the aromatic
 sulphur, leading to an inability to kekulize.

 ** **

 I was wondering why this might occur?

 **


I think there's actually an error in the input structure. It looks like
there should be a charge somewhere in the 5 ring.

In any case, what the RDKit is currently doing is wrong. It should either
fail on the input structure or produce a SMILES that can be read back in.
I'll file a bug for it.

-greg



  **

 Thanks,

 ** **

 Anthony

 ** **

 Sim_1 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2sc1NS(=O)(=O)c1ccc(Cl)s1' # None mol -
 what rdkit outputs

 ** **

 Smi_2 = 'C=CCn1c2ccc(S(N)(=O)=O)cc2sc1=NS(=O)(=O)c1ccc(Cl)s1' # Not None *
 ***

 ** **

 Smi_3 = ‘C=CCn1c2ccc(S(N)(=O)=O)cc2[sH]c1NS(=O)(=O)c1ccc(Cl)s1' # Not None
 

 ** **

 # Smi_2 and Smi_3 both hold a different oxidation state for the sulphur.
 According to the SDF it should be Smi_3.

 ** **

 from rdkit import Chem

 ** **

 sdf= probmol

  RDKit  3D

 ** **

 26 28  0  0  0  0  0  0  0  0999 V2000

40.2640  -47.5920   65.9800 N   0  0  0  0  0  0  0  0  0  0  0  0

41.1750  -46.7850   66.5750 C   0  0  0  0  0  0  0  0  0  0  0  0

41.4340  -46.7010   67.9480 C   0  0  0  0  0  0  0  0  0  0  0  0

42.4150  -45.8090   68.3990 C   0  0  0  0  0  0  0  0  0  0  0  0

43.1240  -45.0120   67.4860 C   0  0  0  0  0  0  0  0  0  0  0  0

42.8540  -45.1070   66.1140 C   0  0  0  0  0  0  0  0  0  0  0  0

41.8740  -45.9990   65.6750 C   0  0  0  0  0  0  0  0  0  0  0  0

41.4130  -46.2370   64.0380 S   0  0  0  0  0  0  0  0  0  0  0  0

40.2720  -47.4090   64.6280 C   0  0  0  0  0  0  0  0  0  0  0  0

39.4590  -48.0810   63.8510 N   0  0  0  0  0  0  0  0  0  0  0  0

39.3560  -48.5030   66.6990 C   0  0  0  0  0  0  0  0  0  0  0  0

39.9550  -49.8630   66.8630 C   0  0  0  0  0  0  0  0  0  0  0  0

40.2440  -50.3500   68.0660 C   0  0  0  0  0  0  0  0  0  0  0  0

39.3310  -47.9450   62.1280 S   0  0  0  0  0  0  0  0  0  0  0  0

40.7120  -48.0440   61.5180 O   0  0  0  0  0  0  0  0  0  0  0  0

38.4830  -49.0830   61.6020 O   0  0  0  0  0  0  0  0  0  0  0  0

38.5560  -46.4150   61.7050 C   0  0  0  0  0  0  0  0  0  0  0  0

39.4690  -45.0310   61.2360 S   0  0  0  0  0  0  0  0  0  0  0  0

37.9620  -44.2200   61.0510 C   0  0  0  0  0  0  0  0  0  0  0  0

36.8440  -44.9740   61.3330 C   0  0  0  0  0  0  0  0  0  0  0  0

37.8570  -42.5080   60.5230 Cl  0  0  0  0  0  0  0  0  0  0  0  0

37.1890  -46.2470   61.7120 C   0  0  0  0  0  0  0  0  0  0  0  0

44.3650  -43.8910   68.0560 S   0  0  0  0  0  0  0  0  0  0  0  0

45.0700  -44.4650   69.2670 O   0  0  0  0  0  0  0  0  0  0  0  0

45.3810  -43.6550   66.9600 O   0  0  0  0  0  0  0  0  0  0  0  0

43.6310  -42.3950   68.5100 N   0  0  0  0  0  0  0  0  0  0  0  0

   1  2  1  0

   1 11  1  0

   2  3  2  0

   3  4  1  0

   5 23  1  0

   5  4  2  0

   6  5  1  0

   7  6  2  0

   7  2  1  0

   8  9  2  0

   8  7  1  0

   9  1  1  0

 10  9  1  0

 11 12  1  0

 12 13  2  0

 14 10  1  0

 15 14  2  0

 16 14  2  0

 17 22  2  0

 17 14  1  0

 18 17  1  0

 19 18  1  0

 19 20  2  0

 20 22  1  0

 21 19  1  0

 23 26  1  0

 23 24  2  0

 25 23  2  0

 M  END

 ** **

 mol = Chem.MolFromMolBlock(sdf)

 ** **

 mol is None

 ** **

 # Gives false 

 ** **

 # Then convert to smiles and back

 smimol = Chem.MolFromSmiles(Chem.MolToSmiles(mol))

 ** **

 smimol is None

 ** **

 # Gives true

 ** **

 ** **


 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!
 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



[Rdkit-discuss] [RDKit-Discuss]: Aromatic Heavy Atoms

2013-07-26 Thread Christos Kannas
Dear RDKiters,

I'm creating a descriptor for estimating water solubility (clogSw) base on
the following article of Delaney (doi:10.1021/ci034243x).

J. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular
 Structure,” *Journal of Chemical Information and Modeling*, vol. 44, no.
 3, pp. 1000–1005, May 2004.


In this paper he proposes an equation to calculate an estimation of the
water solubility of molecules based on physio-chemical descriptors.

One of the descriptors used is Aromatic Proportion, that is the proportion
of heavy atoms of the molecule that are in aromatic ring.

So in order to find the aromatic heavy atoms I use GetSubstructMatches(...)
with query SMARTS '[a]'. Is that the correct way to find all the aromatic
atoms of a molecule? If not what is the correct SMARTS to use?

@Greg: When I complete this, can we look into adding it as a new
descriptor, clogSw (like clogP), within the RDKit distribution?

Kind Regards,
Christos

-- 

Christos Kannas
Researcher
Ph.D Student

e-Health Laboratory http://www.medinfo.cs.ucy.ac.cy/
kannas.chris...@ucy.ac.cy
kannas.chris...@cs.ucy.ac.cy
chriskan...@gmail.com

Mob: (+357) 99530608
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [RDKit-Discuss]: Aromatic Heavy Atoms

2013-07-26 Thread Greg Landrum
Hi Christos,

On Friday, July 26, 2013, Christos Kannas wrote:


 One of the descriptors used is Aromatic Proportion, that is the proportion
 of heavy atoms of the molecule that are in aromatic ring.

 So in order to find the aromatic heavy atoms I use
 GetSubstructMatches(...) with query SMARTS '[a]'. Is that the correct way
 to find all the aromatic atoms of a molecule? If not what is the correct
 SMARTS to use?


That's the correct SMARTS. There may already be a function that calculates
the number of aromatic atoms (I am on my phone and can't check); take a
look in rdkit.Chem.rdMolDescriptors. If nothing is there already and you
are working from python, using the smarts matcher as you propose is
probably the best way.

@Greg: When I complete this, can we look into adding it as a new
 descriptor, clogSw (like clogP), within the RDKit distribution?


I can't think of an argument against it; I would be happy to take a look at
a pull request once you have it ready to go.

-greg
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss