Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-25 Thread Markus Sitzmann
You can report it to Marc Nicklaus ... who will probably sent it to me
... I will take a look. Whether I can fix any misbehavior is another
question.

On Tue, Feb 24, 2015 at 8:27 AM, Greg Landrum greg.land...@gmail.com wrote:

 The InChIs have me confused.

 I'm going to simplify the below by just showing the input SMILES, the
 current (=master) RDKit InChI and the PubChem InChI

 On Mon, Feb 23, 2015 at 10:54 AM, JP jeanpaul.ebe...@inhibox.com wrote:


 Here is the list (first inchi is the 2014_09_2, second one is the
 2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov):

 O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1

 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13-
 # RDKit 2015.03.1pre

 InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15?
 # cactus.nci.nih.gov

 O=C(/N=c1\[nH]c(-c2n2)cs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1
 InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17-

 InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21?

 CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)[nH]1
 InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16-

 InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19?

 COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)s1
 InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14-

 InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17?

 COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)sc1C(C)C
 InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16-

 InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20?

 CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2c2)C(=O)/N=c2\[nH]ncs2)CC1
 InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1

 InChI=1S/C21H36N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h14-18,22H,3-13H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1


 If you look in the formula layer for the InChIs from PubChem, you will see
 that they all have *way* too many H atoms. I think there's something about
 the structures that is confusing the pubchem/cactvs conversion code.

 Compare these two outputs.
 Aromatic form:
 http://cactus.nci.nih.gov/chemical/structure/O=C(N=c1[nH]ncs1)C1CCC(Cn2cnc3c3c2=O)CC1/stdinchi
 produces:
 InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)

 Kekule form:
 http://cactus.nci.nih.gov/chemical/structure/O=C(/N=C1/[NH]N=CS1)[C@H]1CC[C@H](CN2C=NC3=CC=CC=C3C2=O)CC1/stdinchi
 produces:
 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13-

 In fact, converting the 5 membered ring to kekule form is enough:
 http://cactus.nci.nih.gov/chemical/structure/O=C(N=C1[NH]N=CS1)C1CCC(Cn2cnc3c3c2=O)CC1/stdinchi
 produces:
 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)

 This can't be true.

 We can further simplify things to track down the problem:

 http://cactus.nci.nih.gov/chemical/structure/N=c1[nH]ncs1/stdinchi
 InChI=1S/C2H5N3S/c3-2-5-4-1-6-2/h4H,1H2,(H2,3,5)

 vs

 http://cactus.nci.nih.gov/chemical/structure/O=c1[nH]ncs1/stdinchi
 InChI=1S/C2H2N2OS/c5-2-4-3-1-6-2/h1H,(H,4,5)

 It seems to be the exocyclic bond to an atom with Hs. This is ok:
 http://cactus.nci.nih.gov/chemical/structure/O=c1occo1/stdinchi
 InChI=1S/C3H2O3/c4-3-5-1-2-6-3/h1-2H

 but both of these are wrong:
 http://cactus.nci.nih.gov/chemical/structure/N=c1occo1/stdinchi
 InChI=1S/C3H5NO2/c4-3-5-1-2-6-3/h4H,1-2H2

 http://cactus.nci.nih.gov/chemical/structure/C=c1occo1/stdinchi
 InChI=1S/C4H6O2/c1-4-5-2-3-6-4/h1-3H2

 I'm pretty sure that this is not the RDKit doing the wrong 

Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-25 Thread Greg Landrum
The InChIs have me confused.

I'm going to simplify the below by just showing the input SMILES, the
current (=master) RDKit InChI and the PubChem InChI

On Mon, Feb 23, 2015 at 10:54 AM, JP jeanpaul.ebe...@inhibox.com wrote:


 Here is the list (first inchi is the 2014_09_2, second one is the
 2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov):

 O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1
 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13-
 # RDKit 2015.03.1pre
 InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15?
 # cactus.nci.nih.gov

 O=C(/N=c1\[nH]c(-c2n2)cs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1
 InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17-

 InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21?

 CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)[nH]1
 InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16-

 InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19?

 COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)s1
 InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14-

 InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17?

 COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)sc1C(C)C
 InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16-

 InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20?

 CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2c2)C(=O)/N=c2\[nH]ncs2)CC1
 InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1

 InChI=1S/C21H36N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h14-18,22H,3-13H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1


If you look in the formula layer for the InChIs from PubChem, you will see
that they all have *way* too many H atoms. I think there's something about
the structures that is confusing the pubchem/cactvs conversion code.

Compare these two outputs.
Aromatic form:
http://cactus.nci.nih.gov/chemical/structure/O=C(N=c1[nH]ncs1)C1CCC(Cn2cnc3c3c2=O)CC1/stdinchi
produces:
InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)

Kekule form:
http://cactus.nci.nih.gov/chemical/structure/O=C(/N=C1/[NH]N=CS1)[C@H]1CC[C@H](CN2C=NC3=CC=CC=C3C2=O)CC1/stdinchi
produces:
InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13-

In fact, converting the 5 membered ring to kekule form is enough:
http://cactus.nci.nih.gov/chemical/structure/O=C(N=C1[NH]N=CS1)C1CCC(Cn2cnc3c3c2=O)CC1/stdinchi
produces:
InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)

This can't be true.

We can further simplify things to track down the problem:

http://cactus.nci.nih.gov/chemical/structure/N=c1[nH]ncs1/stdinchi
InChI=1S/C2H5N3S/c3-2-5-4-1-6-2/h4H,1H2,(H2,3,5)

vs

http://cactus.nci.nih.gov/chemical/structure/O=c1[nH]ncs1/stdinchi
InChI=1S/C2H2N2OS/c5-2-4-3-1-6-2/h1H,(H,4,5)

It seems to be the exocyclic bond to an atom with Hs. This is ok:
http://cactus.nci.nih.gov/chemical/structure/O=c1occo1/stdinchi
InChI=1S/C3H2O3/c4-3-5-1-2-6-3/h1-2H

but both of these are wrong:
http://cactus.nci.nih.gov/chemical/structure/N=c1occo1/stdinchi
InChI=1S/C3H5NO2/c4-3-5-1-2-6-3/h4H,1-2H2

http://cactus.nci.nih.gov/chemical/structure/C=c1occo1/stdinchi
InChI=1S/C4H6O2/c1-4-5-2-3-6-4/h1-3H2

I'm pretty sure that this is not the RDKit doing the wrong thing.

@Markus: what would be the best way to report this to the NCI CADD guys?

-greg
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed 

Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-23 Thread Markus Sitzmann
Well, the http://cactus.nci.nih.gov/chemical/structure/ site is my
baby which I had to leave behind 1 1/2 years ago (I am not with NIH
anymore). Igor who replied in this thread was also involved in some
parts of it. Traffic on this cactus service is between 5 to 10 million
requests per month - so I think the service survived your attack ;-)

And I am not saying it is perfect, it just provides another
implementation to double-check things in question. It has the CACTVS
chemoinformatic toolkit as chemistry backend which I think is
well-tested.

Markus

On Mon, Feb 23, 2015 at 10:54 AM, JP jeanpaul.ebe...@inhibox.com wrote:
 Ok so I got out my test set of 6,940,083 molecules.  First, I generated the
 inchi using 2014_09_2.  I then checked out (and built) the master (with
 Greg's latest commits) from github and regenerated the inchis for all these
 molecules.

 3,257 molecules (of 6,940,083) gave me a different inchis between the
 current production version and the development (github) one.

 For these 3,257 molecules I hammered the
 http://cactus.nci.nih.gov/chemical/structure/%s/stdinchi site and assumed
 this to be the 'correct' inchi (those great guys will have an interesting
 spike in their web traffic last Fri evening).  In 6 (out of 3,257) cases we
 get different Inchis from cactus.nci.nih.gov vs RDKit github development
 version (2015.03.1pre).

 Here is the list (first inchi is the 2014_09_2, second one is the
 2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov):

 O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1
 MPQBIWRBISQCLJ-BETUJISGSA-N MPQBIWRBISQCLJ-JOCQHMNTSA-N
 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13+
 # RDKit 2014_09_2
 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13-
 # RDKit 2015.03.1pre
 InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15?
 # cactus.nci.nih.gov

 O=C(/N=c1\[nH]c(-c2n2)cs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1
 CZKXHWCYFFXKGH-CALCHBBNSA-N CZKXHWCYFFXKGH-QAQDUYKDSA-N
 InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17+
 InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17-
 InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21?

 CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)[nH]1
 GAXCPQSXDNGSQV-IYBDPMFKSA-N GAXCPQSXDNGSQV-WKILWMFISA-N
 InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16+
 InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16-
 InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19?

 COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)s1
 YVZJPKUMKXPZTK-OKILXGFUSA-N YVZJPKUMKXPZTK-HDJSIYSDSA-N
 InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14+
 InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14-
 InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17?

 COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)sc1C(C)C
 KNDSLDLCZNAXPK-IYBDPMFKSA-N KNDSLDLCZNAXPK-WKILWMFISA-N
 InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16+
 InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16-
 InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20?

 CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2c2)C(=O)/N=c2\[nH]ncs2)CC1
 OKTRHZCAACPPLC-FGTMMUONSA-N OKTRHZCAACPPLC-KZNAEPCWSA-N
 InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17+,18-/m1/s1
 

Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-23 Thread Dimitri Maziuk
On 02/23/2015 06:12 AM, Markus Sitzmann wrote:

 And I am not saying it is perfect, it just provides another
 implementation to double-check things in question. It has the CACTVS
 chemoinformatic toolkit as chemistry backend which I think is
 well-tested.

 On Mon, Feb 23, 2015 at 10:54 AM, JP jeanpaul.ebe...@inhibox.com wrote:

 3,257 molecules (of 6,940,083) gave me a different inchis between the
 current production version and the development (github) one.

Just as another data point, out of 14 metabolites that's gone through my
scripts so far:

1. cis-vaccenic acid, PubChem CID 5282761: InChI produced by RDKit
2014.09.2 differs from that from OpenBabel. InChI from PubChem's SDF
agrees with the latter. (RDKit's ends with /b8-7+, OB  PC: 7-.)

InChI code does spit out undefined stereochemistry warnings for OB's
we don't need no C.I.P., it's sooo last century stereo -- and then
includes the stereo layer in the output anyway. (Though in this case
PubChem's presumably OpenEye stereo seems to agree with OB and not RDKit.)

2. 5,10,15,20-Tetraphenyl-21H,23H-porphine zinc, PubChem CID 3580039:
InChI in the SDF ends at the /q, both RDKit and OpenBabel add /b
layer. They all agree, though.

14 data points is nowhere near enough for meaningful conclusions, but
still... 14% won't match the plain string comparison that most searches
do and 7% won't match the clever InChI-aware comparison search,
assuming it's implemented anywhere.

And then you get .05% between different versions of the same software...

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-23 Thread JP
Ok so I got out my test set of 6,940,083 molecules.  First, I generated the
inchi using 2014_09_2.  I then checked out (and built) the master (with
Greg's latest commits) from github and regenerated the inchis for all these
molecules.

3,257 molecules (of 6,940,083) gave me a different inchis between the
current production version and the development (github) one.

For these 3,257 molecules I hammered the
http://cactus.nci.nih.gov/chemical/structure/%s/stdinchi site and assumed
this to be the 'correct' inchi (those great guys will have an interesting
spike in their web traffic last Fri evening).  In 6 (out of 3,257) cases we
get different Inchis from cactus.nci.nih.gov vs RDKit github development
version (2015.03.1pre).

Here is the list (first inchi is the 2014_09_2, second one is the
2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov):

O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1
MPQBIWRBISQCLJ-BETUJISGSA-N MPQBIWRBISQCLJ-JOCQHMNTSA-N
InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13+
# RDKit 2014_09_2
InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13-
# RDKit 2015.03.1pre
InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15?
# cactus.nci.nih.gov

O=C(/N=c1\[nH]c(-c2n2)cs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1
CZKXHWCYFFXKGH-CALCHBBNSA-N CZKXHWCYFFXKGH-QAQDUYKDSA-N
InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17+
InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17-
InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21?

CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)[nH]1
GAXCPQSXDNGSQV-IYBDPMFKSA-N GAXCPQSXDNGSQV-WKILWMFISA-N
InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16+
InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16-
InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19?

COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)s1
YVZJPKUMKXPZTK-OKILXGFUSA-N YVZJPKUMKXPZTK-HDJSIYSDSA-N
InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14+
InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14-
InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17?

COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)sc1C(C)C
KNDSLDLCZNAXPK-IYBDPMFKSA-N KNDSLDLCZNAXPK-WKILWMFISA-N
InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16+
InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16-
InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20?

CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2c2)C(=O)/N=c2\[nH]ncs2)CC1
OKTRHZCAACPPLC-FGTMMUONSA-N OKTRHZCAACPPLC-KZNAEPCWSA-N
InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17+,18-/m1/s1
InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1
InChI=1S/C21H36N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h14-18,22H,3-13H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1


I have looked at these molecules in MarvinSketch to try to figure out why
different inchis are being generated.  Perhaps there is a problem in RDKit
which is always detecting one of the rings as aromatic (the Inchi doesn't
seem to agree on the aromaticity).

I hope this is helpful.
JP



-
Jean-Paul Ebejer
Early Stage Researcher

Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Markus Sitzmann
Well, at least you said something important: conversion of InChI to
molecules is something that's not in general guaranteed to work
perfectly - and this is by design like this because InChI is an
identifier, not a molecule representation. Unfortunately, many people
seemed to forget about this :-)

On Thu, Feb 19, 2015 at 6:59 AM, Greg Landrum greg.land...@gmail.com wrote:

 On Wed, Feb 18, 2015 at 7:01 PM, Igor Filippov igor.v.filip...@gmail.com
 wrote:

  update the bug report and work on tracking down the wrong problem

 That's how I sometimes do it too... ;)


 I'll leave it as an exercise to the reader to decide if that was
 intentional, the fault of auto-correct, or just because it had been a long
 day. ;-)

 -greg


 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Dimitri Maziuk
On 02/19/2015 01:24 PM, Igor Filippov wrote:
 Markus also spelled out for you different variations for context in the
 same exchange.
 Do different tautomers represent different chemicals or the same one?

Read the thread.

 Do face recognition identifiers even approach the accuracy of InChI
 identifiers?

At this point it's the other way around: facial recognition success
rates are anywhere between 25% and 95%. How many of the existing
molecules can be represented by InChI? (I'll give you a hint: none
longer than MAX_ATOMS defined in ichisize.h.)

 If you still insist that there could be only one singular definition of
 unique in the universe then I am afraid
 this definition has no meaning and you are alone...ehm... unique.. in using
 it.

When you have a different analytical engine built on different logic,
then you can have your different definition of unique in the context
of a computer system. As long as you're using a digital computer you're
using the same simplistic integers, boolean algebra, and discrete math.
That's the objective reality, it won't change no matter how much you can
argue faces in universe.

Similarly, when you get yourself a different English language, then you
can have a different definition of unique as an English word. In this
version of English, go buy a dictionary.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Igor Filippov
Dmitri,

As others before me I tried to explain to you that the simplistic
definition of unique molecule
is rather naive and is neither  useful nor reflects reality. Perhaps my
explanation is woefully inadequate to convey
the meaning I would like to convey but that is no excuse to reply with
rudeness and condescension.
If you are unable to present your arguments in a civilized manner then
please cease this discussion.

Best regards,
Igor


On Thu, Feb 19, 2015 at 2:53 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu
wrote:

 On 02/19/2015 01:24 PM, Igor Filippov wrote:
  Markus also spelled out for you different variations for context in the
  same exchange.
  Do different tautomers represent different chemicals or the same one?

 Read the thread.

  Do face recognition identifiers even approach the accuracy of InChI
  identifiers?

 At this point it's the other way around: facial recognition success
 rates are anywhere between 25% and 95%. How many of the existing
 molecules can be represented by InChI? (I'll give you a hint: none
 longer than MAX_ATOMS defined in ichisize.h.)

  If you still insist that there could be only one singular definition of
  unique in the universe then I am afraid
  this definition has no meaning and you are alone...ehm... unique.. in
 using
  it.

 When you have a different analytical engine built on different logic,
 then you can have your different definition of unique in the context
 of a computer system. As long as you're using a digital computer you're
 using the same simplistic integers, boolean algebra, and discrete math.
 That's the objective reality, it won't change no matter how much you can
 argue faces in universe.

 Similarly, when you get yourself a different English language, then you
 can have a different definition of unique as an English word. In this
 version of English, go buy a dictionary.
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Greg Landrum
On Thu, Feb 19, 2015 at 10:11 AM, Markus Sitzmann markus.sitzm...@gmail.com
 wrote:

 Well, at least you said something important: conversion of InChI to
 molecules is something that's not in general guaranteed to work
 perfectly - and this is by design like this because InChI is an
 identifier, not a molecule representation. Unfortunately, many people
 seemed to forget about this :-)


Yes, yes they do.

-greg
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Dimitri Maziuk
On 2015-02-19 05:58, Greg Landrum wrote:

 On Thu, Feb 19, 2015 at 10:11 AM, Markus Sitzmann
 markus.sitzm...@gmail.com mailto:markus.sitzm...@gmail.com wrote:

 Well, at least you said something important: conversion of InChI to
 molecules is something that's not in general guaranteed to work
 perfectly - and this is by design like this because InChI is an
 identifier, not a molecule representation. Unfortunately, many people
 seemed to forget about this :-)


 Yes, yes they do.

Well unfortunately inchi states they're a 'unique identifier' which 
means there must be 1 inchi for 1 molecule and it *should* work 
perfectly. And then they say the only required 'layer' is the formula 
which means a) it's not unique and b) how is InChi=formula better than 
just formula? D'uh.

Dimitri


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Markus Sitzmann
No, a chemical structure must calculate a unique InChI, but a InChI
might cover more then one chemical structure (because their are
molecules that can be described by more than one chemical structure).
And a chemical formula might be the most accurate (unique) description
you have for a molecule (admittedly, unlikely today), however, that is
why the InChI is layered. Ba adding and removing layers, InChI allows
you how precisely you want to define uniqueness - that is important
with molecules because there is no precise, universally valid
definition for unique molecule.

On Thu, Feb 19, 2015 at 2:06 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 On 2015-02-19 05:58, Greg Landrum wrote:

 On Thu, Feb 19, 2015 at 10:11 AM, Markus Sitzmann
 markus.sitzm...@gmail.com mailto:markus.sitzm...@gmail.com wrote:

 Well, at least you said something important: conversion of InChI to
 molecules is something that's not in general guaranteed to work
 perfectly - and this is by design like this because InChI is an
 identifier, not a molecule representation. Unfortunately, many people
 seemed to forget about this :-)


 Yes, yes they do.

 Well unfortunately inchi states they're a 'unique identifier' which
 means there must be 1 inchi for 1 molecule and it *should* work
 perfectly. And then they say the only required 'layer' is the formula
 which means a) it's not unique and b) how is InChi=formula better than
 just formula? D'uh.

 Dimitri


 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Dimitri Maziuk
On 02/19/2015 12:29 PM, Igor Filippov wrote:
  No. there's only one definition if unique
 This is way too simplistic. The definition of unique depends on the
 application.

There is a context to this thread. The application was spelled out in
Markus and Greg's

 Well, at least you said something important: conversion of InChI to
 molecules is something that's not in general guaranteed to work
 perfectly - and this is by design like this because InChI is an
 identifier, not a molecule representation. Unfortunately, many people
 seemed to forget about this :-)

 Yes, yes they do.

Persons' faces and other disciplines have nothing to do with it.

(Note, however, the face recognition people actually get simplistic
integer numbers so their unique keys tend to be based on well-defined
metrics and factor in isometries and other fun math stuff. Unlike IUPAC
and InChI.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Igor Filippov
  No. there's only one definition if unique
This is way too simplistic. The definition of unique depends on the
application.
Not only in chemistry but other fields as well. The way you just defined
unique is appropriate for integer numbers,
but not everything is quite so trivial.
Is human face unique? What about picture of the same person taken at 5, 15,
25, 45 years of age?
Is it the same picture or completely different? Faces of identical twins?
The uniqueness is defined by what you need to accomplish, not by some
god-given attribute of the object,
otherwise no two things are the same and unique loses all meaning.

Best,
Igor


On Thu, Feb 19, 2015 at 12:54 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu
wrote:

 On 02/19/2015 08:54 AM, Markus Sitzmann wrote:
  A database can have several definitions of unique for anything - a
  structure database can have this, too. If you have a chemical compound
  which can form 10 different tautomers, you can represent the compound
  by 10 chemical structures (it is still the same compound, though). So,
  if you define uniqueness on basis of chemical compound, you have one
  db entry and this one entry has a single (tatuomer-sensitive) InChI
  covering 10 chemical structures; if you define uniqueness on basis of
  tautomers/chemical structures (because all are relevant, for instance,
  in NMR spectrosopy) you have (and want) 10 database entries, each with
  a single (tautomer-sensitive) InChI. Two definitions of unique.

 No. there's only one definition if unique: unique key is a set of
 attributes that is guaranteed to be unique for each entity. The
 relationship between the key and the entity is symmetric: if x is the
 inchi string for compound y then y is the compound for inchi string x.

 If follows that if y is the compound for inchi string x, and z is also
 the compound for inchi string x, then x is not unique.

 What you have is two definitions of chemical compound.

 You can, in your database, define 10 different tautomers as ein
 compound, ein unique key. Your database will be useless for any number
 of applications. You can define 10 different tautomers as 10 different
 compounds with 10 different unique keys. Your database will be too
 heavy for any number of applications. It's your database.

 What you can't do is redefine unique to mean two things at once: it's
 not your discrete math. Sorry.
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Greg Landrum
Ok, I just checked in a set of changes to address the problems with InChI
generation and InChI -mol conversion for 3-coordinate atoms.

There are some notes and pointers to the code changes here:
https://github.com/rdkit/rdkit/issues/437

Please note the final comment there: JP's original example now gives a
correct InChI, and it looks like that InChI is correctly converted into a
SMILES, but the resulting SMILES is not properly canonicalized. G.
I will continue to look at that.

In the meantime, I would be extremely grateful if someone else could take a
look at the testing code I added:
https://github.com/rdkit/rdkit/blob/ca0c4956765952e8c0556885c6ac8e62bac197e1/External/INCHI-API/test.cpp#L256
and let me know if they see any problems with the expected InChIs I'm using.


Best,
-greg


On Wed, Feb 18, 2015 at 4:58 PM, Markus Sitzmann markus.sitzm...@gmail.com
wrote:

 I agree with John, the InChI for mol1 and mol2 should be


 http://cactus.nci.nih.gov/chemical/structure/O=C(NCCc1c1)[C@H]1CC[C@H](Cn2c(O)nc3c3c2=O)CC1/stdinchi


 InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19-

 So the + at the end should be a -

 Markus

 On Wed, Feb 18, 2015 at 2:53 PM, John M john.wilkinson...@gmail.com
 wrote:
  Hi Greg,
 
  I believe it's an RDKitMol - InChI issue rather than InChI - RDKitMol.
 The
  correct InChI (below) is different from that in the iPython listing.
 
 
 InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19-
 
  J
 
 
  Regards,
  John W May
  john.wilkinson...@gmail.com
 
  On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com
 wrote:
 
  JP,
 
  Looks like that's a bug in the way ring stereochemistry is handled while
  translating the InChI back into an molecule.
 
  It's reproducible with a small example:
  In [1]: from rdkit import Chem
 
  In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1)
 
  In [3]: Chem.MolToSmiles(mol1,True)
  Out[3]: 'C[C@H]1CC[C@H](O)CC1'
 
  In [4]: inchi = Chem.MolToInchi(mol1)
 
  In [5]: mol2 = Chem.MolFromInchi(inchi)
 
  In [6]: Chem.MolToSmiles(mol2,True)
  Out[6]: 'C[C@H]1CC[C@@H](O)CC1'
 
  Conversion of InChI to molecules is something that's not in general
  guaranteed to work perfectly, but I will go ahead and create a bug
 report.
 
  -greg
 
 
 
  On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com
 wrote:
 
  Hi there,
 
  I have a question for the 3D enabled of you (I wish the world looked
 like
  GTA2 !)
 
  I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I think
 is
  changing the  stereochemistry of the molecule.  I have 12 example-pairs
  where this happens (but all very structurally similar).  I don't care
 much
  that the last rdkit molecule is a different tautomer than the starting
 one -
  but if this is the case the stereochemistry should still be conserved,
 no?
 
  I did an ipython notebook (most useful tool of the decade after RDKit?)
  gist here:
 
 
 
 http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt
 
  I appreciate if anyone could shed some light.  I'd just like to
  understand.
 
  Thank you for your time!
 
  -
  JP
 
 
 
 --
  Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
  from Actuate! Instantly Supercharge Your Business Reports and
 Dashboards
  with Interactivity, Sharing, Native Excel Exports, App Integration 
 more
  Get technology previously reserved for billion-dollar corporations,
 FREE
 
 
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
  ___
  Rdkit-discuss mailing list
  Rdkit-discuss@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
 
 
 
 
 --
  Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
  from Actuate! Instantly Supercharge Your Business Reports and Dashboards
  with Interactivity, Sharing, Native Excel Exports, App Integration 
 more
  Get technology previously reserved for billion-dollar corporations, FREE
 
 
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
  ___
  Rdkit-discuss mailing list
  Rdkit-discuss@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
 
 
 
 --
  Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
  from Actuate! Instantly Supercharge Your Business Reports and Dashboards
  with Interactivity, Sharing, Native Excel Exports, App 

Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Greg Landrum
A general comment/request:
One of the great things about the RDKit community, including this mailing
list, is how supportive and helpful people are. In contrast to many online
communities it's a friendly place and I think that's great.

This thread is starting to get a bit aggressive in tone... please keep an
eye on that.

-greg
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Greg Landrum
On Thu, Feb 19, 2015 at 6:54 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu
wrote:

 On 02/19/2015 08:54 AM, Markus Sitzmann wrote:
  A database can have several definitions of unique for anything - a
  structure database can have this, too. If you have a chemical compound
  which can form 10 different tautomers, you can represent the compound
  by 10 chemical structures (it is still the same compound, though). So,
  if you define uniqueness on basis of chemical compound, you have one
  db entry and this one entry has a single (tatuomer-sensitive) InChI
  covering 10 chemical structures; if you define uniqueness on basis of
  tautomers/chemical structures (because all are relevant, for instance,
  in NMR spectrosopy) you have (and want) 10 database entries, each with
  a single (tautomer-sensitive) InChI. Two definitions of unique.

 No. there's only one definition if unique: unique key is a set of
 attributes that is guaranteed to be unique for each entity. The
 relationship between the key and the entity is symmetric: if x is the
 inchi string for compound y then y is the compound for inchi string x.


This could be true if the InChI algorithm just took the input structure you
provided and returned a string for it. That's not, however, what it does.
The algorithm first standardizes the molecule: modifying the tautomeric
state, moving charges around, etc., then generates the InChI string.

Assuming you use the full standard InChI string, what you end up with is a
function that maps every molecule to a unique InChI. You can also map every
InChI back to a unique molecule (though, as discussed above and elsewhere,
this can be somewhat fraught with danger), but the unique molecule that the
InChI corresponds to is not necessarily the same as the input molecule.

-greg
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Markus Sitzmann
A database can have several definitions of unique for anything - a
structure database can have this, too. If you have a chemical compound
which can form 10 different tautomers, you can represent the compound
by 10 chemical structures (it is still the same compound, though). So,
if you define uniqueness on basis of chemical compound, you have one
db entry and this one entry has a single (tatuomer-sensitive) InChI
covering 10 chemical structures; if you define uniqueness on basis of
tautomers/chemical structures (because all are relevant, for instance,
in NMR spectrosopy) you have (and want) 10 database entries, each with
a single (tautomer-sensitive) InChI. Two definitions of unique.

So my sentence still stands: a chemical structure must calculate a
unique InChI, but a InChI might cover more then one chemical
structure.

On Thu, Feb 19, 2015 at 3:37 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 On 2015-02-19 07:27, Markus Sitzmann wrote:

 No, a chemical structure must calculate a unique InChI, but a InChI
 might cover more then one chemical structure


 Heh. I could swear last time I read the description it specifically
 mentioned databases. In the database context 'unique' has a specific
 well-defined meaning and that is *not* 'more than one'. Now I don't see it
 in the official blurbs, only pikiwedia mentions databases.

 ... there is no precise, universally valid
 definition for unique molecule.


 On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the
 machine wrong figures, will the right answers come out?' I am not able
 rightly to apprehend the kind of confusion of ideas that could provoke such
 a question.

 Works for 'undefined figures', too.

 Dimitri



--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Dimitri Maziuk
On 2015-02-19 07:27, Markus Sitzmann wrote:
 No, a chemical structure must calculate a unique InChI, but a InChI
 might cover more then one chemical structure

Heh. I could swear last time I read the description it specifically 
mentioned databases. In the database context 'unique' has a specific 
well-defined meaning and that is *not* 'more than one'. Now I don't see 
it in the official blurbs, only pikiwedia mentions databases.

 ... there is no precise, universally valid
 definition for unique molecule.

On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into 
the machine wrong figures, will the right answers come out?' I am not 
able rightly to apprehend the kind of confusion of ideas that could 
provoke such a question.

Works for 'undefined figures', too.

Dimitri



--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-19 Thread Dimitri Maziuk
On 02/19/2015 08:54 AM, Markus Sitzmann wrote:
 A database can have several definitions of unique for anything - a
 structure database can have this, too. If you have a chemical compound
 which can form 10 different tautomers, you can represent the compound
 by 10 chemical structures (it is still the same compound, though). So,
 if you define uniqueness on basis of chemical compound, you have one
 db entry and this one entry has a single (tatuomer-sensitive) InChI
 covering 10 chemical structures; if you define uniqueness on basis of
 tautomers/chemical structures (because all are relevant, for instance,
 in NMR spectrosopy) you have (and want) 10 database entries, each with
 a single (tautomer-sensitive) InChI. Two definitions of unique.

No. there's only one definition if unique: unique key is a set of
attributes that is guaranteed to be unique for each entity. The
relationship between the key and the entity is symmetric: if x is the
inchi string for compound y then y is the compound for inchi string x.

If follows that if y is the compound for inchi string x, and z is also
the compound for inchi string x, then x is not unique.

What you have is two definitions of chemical compound.

You can, in your database, define 10 different tautomers as ein
compound, ein unique key. Your database will be useless for any number
of applications. You can define 10 different tautomers as 10 different
compounds with 10 different unique keys. Your database will be too
heavy for any number of applications. It's your database.

What you can't do is redefine unique to mean two things at once: it's
not your discrete math. Sorry.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-18 Thread Igor Filippov
 update the bug report and work on tracking down the wrong problem

That's how I sometimes do it too... ;)

Igor

On Wed, Feb 18, 2015 at 12:35 PM, Greg Landrum greg.land...@gmail.com
wrote:

 Yep, you guys are right.
 I diagnosed that too quickly.
 Thanks for pointing out the mistake.

 I'll update the bug report and work on tracking down the wrong problem

 -greg


 On Wed, Feb 18, 2015 at 4:58 PM, Markus Sitzmann 
 markus.sitzm...@gmail.com wrote:

 I agree with John, the InChI for mol1 and mol2 should be


 http://cactus.nci.nih.gov/chemical/structure/O=C(NCCc1c1)[C@H]1CC[C@H](Cn2c(O)nc3c3c2=O)CC1/stdinchi


 InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19-

 So the + at the end should be a -

 Markus

 On Wed, Feb 18, 2015 at 2:53 PM, John M john.wilkinson...@gmail.com
 wrote:
  Hi Greg,
 
  I believe it's an RDKitMol - InChI issue rather than InChI -
 RDKitMol. The
  correct InChI (below) is different from that in the iPython listing.
 
 
 InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19-
 
  J
 
 
  Regards,
  John W May
  john.wilkinson...@gmail.com
 
  On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com
 wrote:
 
  JP,
 
  Looks like that's a bug in the way ring stereochemistry is handled
 while
  translating the InChI back into an molecule.
 
  It's reproducible with a small example:
  In [1]: from rdkit import Chem
 
  In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1)
 
  In [3]: Chem.MolToSmiles(mol1,True)
  Out[3]: 'C[C@H]1CC[C@H](O)CC1'
 
  In [4]: inchi = Chem.MolToInchi(mol1)
 
  In [5]: mol2 = Chem.MolFromInchi(inchi)
 
  In [6]: Chem.MolToSmiles(mol2,True)
  Out[6]: 'C[C@H]1CC[C@@H](O)CC1'
 
  Conversion of InChI to molecules is something that's not in general
  guaranteed to work perfectly, but I will go ahead and create a bug
 report.
 
  -greg
 
 
 
  On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com
 wrote:
 
  Hi there,
 
  I have a question for the 3D enabled of you (I wish the world looked
 like
  GTA2 !)
 
  I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I
 think is
  changing the  stereochemistry of the molecule.  I have 12
 example-pairs
  where this happens (but all very structurally similar).  I don't care
 much
  that the last rdkit molecule is a different tautomer than the
 starting one -
  but if this is the case the stereochemistry should still be
 conserved, no?
 
  I did an ipython notebook (most useful tool of the decade after
 RDKit?)
  gist here:
 
 
 
 http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt
 
  I appreciate if anyone could shed some light.  I'd just like to
  understand.
 
  Thank you for your time!
 
  -
  JP
 
 
 
 --
  Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
  from Actuate! Instantly Supercharge Your Business Reports and
 Dashboards
  with Interactivity, Sharing, Native Excel Exports, App Integration 
 more
  Get technology previously reserved for billion-dollar corporations,
 FREE
 
 
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
  ___
  Rdkit-discuss mailing list
  Rdkit-discuss@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
 
 
 
 
 --
  Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
  from Actuate! Instantly Supercharge Your Business Reports and
 Dashboards
  with Interactivity, Sharing, Native Excel Exports, App Integration 
 more
  Get technology previously reserved for billion-dollar corporations,
 FREE
 
 
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
  ___
  Rdkit-discuss mailing list
  Rdkit-discuss@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
 
 
 
 --
  Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
  from Actuate! Instantly Supercharge Your Business Reports and Dashboards
  with Interactivity, Sharing, Native Excel Exports, App Integration 
 more
  Get technology previously reserved for billion-dollar corporations, FREE
 
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
  ___
  Rdkit-discuss mailing list
  Rdkit-discuss@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 




 --

Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-18 Thread Greg Landrum
Yep, you guys are right.
I diagnosed that too quickly.
Thanks for pointing out the mistake.

I'll update the bug report and work on tracking down the wrong problem

-greg


On Wed, Feb 18, 2015 at 4:58 PM, Markus Sitzmann markus.sitzm...@gmail.com
wrote:

 I agree with John, the InChI for mol1 and mol2 should be


 http://cactus.nci.nih.gov/chemical/structure/O=C(NCCc1c1)[C@H]1CC[C@H](Cn2c(O)nc3c3c2=O)CC1/stdinchi


 InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19-

 So the + at the end should be a -

 Markus

 On Wed, Feb 18, 2015 at 2:53 PM, John M john.wilkinson...@gmail.com
 wrote:
  Hi Greg,
 
  I believe it's an RDKitMol - InChI issue rather than InChI - RDKitMol.
 The
  correct InChI (below) is different from that in the iPython listing.
 
 
 InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19-
 
  J
 
 
  Regards,
  John W May
  john.wilkinson...@gmail.com
 
  On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com
 wrote:
 
  JP,
 
  Looks like that's a bug in the way ring stereochemistry is handled while
  translating the InChI back into an molecule.
 
  It's reproducible with a small example:
  In [1]: from rdkit import Chem
 
  In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1)
 
  In [3]: Chem.MolToSmiles(mol1,True)
  Out[3]: 'C[C@H]1CC[C@H](O)CC1'
 
  In [4]: inchi = Chem.MolToInchi(mol1)
 
  In [5]: mol2 = Chem.MolFromInchi(inchi)
 
  In [6]: Chem.MolToSmiles(mol2,True)
  Out[6]: 'C[C@H]1CC[C@@H](O)CC1'
 
  Conversion of InChI to molecules is something that's not in general
  guaranteed to work perfectly, but I will go ahead and create a bug
 report.
 
  -greg
 
 
 
  On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com
 wrote:
 
  Hi there,
 
  I have a question for the 3D enabled of you (I wish the world looked
 like
  GTA2 !)
 
  I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I think
 is
  changing the  stereochemistry of the molecule.  I have 12 example-pairs
  where this happens (but all very structurally similar).  I don't care
 much
  that the last rdkit molecule is a different tautomer than the starting
 one -
  but if this is the case the stereochemistry should still be conserved,
 no?
 
  I did an ipython notebook (most useful tool of the decade after RDKit?)
  gist here:
 
 
 
 http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt
 
  I appreciate if anyone could shed some light.  I'd just like to
  understand.
 
  Thank you for your time!
 
  -
  JP
 
 
 
 --
  Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
  from Actuate! Instantly Supercharge Your Business Reports and
 Dashboards
  with Interactivity, Sharing, Native Excel Exports, App Integration 
 more
  Get technology previously reserved for billion-dollar corporations,
 FREE
 
 
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
  ___
  Rdkit-discuss mailing list
  Rdkit-discuss@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
 
 
 
 
 --
  Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
  from Actuate! Instantly Supercharge Your Business Reports and Dashboards
  with Interactivity, Sharing, Native Excel Exports, App Integration 
 more
  Get technology previously reserved for billion-dollar corporations, FREE
 
 
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
  ___
  Rdkit-discuss mailing list
  Rdkit-discuss@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
 
 
 
 --
  Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
  from Actuate! Instantly Supercharge Your Business Reports and Dashboards
  with Interactivity, Sharing, Native Excel Exports, App Integration  more
  Get technology previously reserved for billion-dollar corporations, FREE
 
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
  ___
  Rdkit-discuss mailing list
  Rdkit-discuss@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  

Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-18 Thread Greg Landrum
On Wed, Feb 18, 2015 at 7:01 PM, Igor Filippov igor.v.filip...@gmail.com
wrote:

  update the bug report and work on tracking down the wrong problem

 That's how I sometimes do it too... ;)


I'll leave it as an exercise to the reader to decide if that was
intentional, the fault of auto-correct, or just because it had been a long
day. ;-)

-greg
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-18 Thread Markus Sitzmann
I agree with John, the InChI for mol1 and mol2 should be

http://cactus.nci.nih.gov/chemical/structure/O=C(NCCc1c1)[C@H]1CC[C@H](Cn2c(O)nc3c3c2=O)CC1/stdinchi

InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19-

So the + at the end should be a -

Markus

On Wed, Feb 18, 2015 at 2:53 PM, John M john.wilkinson...@gmail.com wrote:
 Hi Greg,

 I believe it's an RDKitMol - InChI issue rather than InChI - RDKitMol. The
 correct InChI (below) is different from that in the iPython listing.

 InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19-

 J


 Regards,
 John W May
 john.wilkinson...@gmail.com

 On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com wrote:

 JP,

 Looks like that's a bug in the way ring stereochemistry is handled while
 translating the InChI back into an molecule.

 It's reproducible with a small example:
 In [1]: from rdkit import Chem

 In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1)

 In [3]: Chem.MolToSmiles(mol1,True)
 Out[3]: 'C[C@H]1CC[C@H](O)CC1'

 In [4]: inchi = Chem.MolToInchi(mol1)

 In [5]: mol2 = Chem.MolFromInchi(inchi)

 In [6]: Chem.MolToSmiles(mol2,True)
 Out[6]: 'C[C@H]1CC[C@@H](O)CC1'

 Conversion of InChI to molecules is something that's not in general
 guaranteed to work perfectly, but I will go ahead and create a bug report.

 -greg



 On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com wrote:

 Hi there,

 I have a question for the 3D enabled of you (I wish the world looked like
 GTA2 !)

 I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I think is
 changing the  stereochemistry of the molecule.  I have 12 example-pairs
 where this happens (but all very structurally similar).  I don't care much
 that the last rdkit molecule is a different tautomer than the starting one -
 but if this is the case the stereochemistry should still be conserved, no?

 I did an ipython notebook (most useful tool of the decade after RDKit?)
 gist here:


 http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt

 I appreciate if anyone could shed some light.  I'd just like to
 understand.

 Thank you for your time!

 -
 JP


 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE
 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-18 Thread John M
Hi Greg,

I believe it's an RDKitMol - InChI issue rather than InChI - RDKitMol.
The correct InChI (below) is different from that in the iPython listing.

InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19-

J

Regards,
John W May
john.wilkinson...@gmail.com

On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com wrote:

 JP,

 Looks like that's a bug in the way ring stereochemistry is handled while
 translating the InChI back into an molecule.

 It's reproducible with a small example:
 In [1]: from rdkit import Chem

 In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1)

 In [3]: Chem.MolToSmiles(mol1,True)
 Out[3]: 'C[C@H]1CC[C@H](O)CC1'

 In [4]: inchi = Chem.MolToInchi(mol1)

 In [5]: mol2 = Chem.MolFromInchi(inchi)

 In [6]: Chem.MolToSmiles(mol2,True)
 Out[6]: 'C[C@H]1CC[C@@H](O)CC1'

 Conversion of InChI to molecules is something that's not in general
 guaranteed to work perfectly, but I will go ahead and create a bug report.

 -greg



 On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com wrote:

 Hi there,

 I have a question for the 3D enabled of you (I wish the world looked like
 GTA2 !)

 I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I think is
 changing the  stereochemistry of the molecule.  I have 12 example-pairs
 where this happens (but all very structurally similar).  I don't care much
 that the last rdkit molecule is a different tautomer than the starting one
 - but if this is the case the stereochemistry should still be conserved, no?

 I did an ipython notebook (most useful tool of the decade after RDKit?)
 gist here:


 http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt

 I appreciate if anyone could shed some light.  I'd just like to
 understand.

 Thank you for your time!

 -
 JP


 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss