Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
You can report it to Marc Nicklaus ... who will probably sent it to me ... I will take a look. Whether I can fix any misbehavior is another question. On Tue, Feb 24, 2015 at 8:27 AM, Greg Landrum greg.land...@gmail.com wrote: The InChIs have me confused. I'm going to simplify the below by just showing the input SMILES, the current (=master) RDKit InChI and the PubChem InChI On Mon, Feb 23, 2015 at 10:54 AM, JP jeanpaul.ebe...@inhibox.com wrote: Here is the list (first inchi is the 2014_09_2, second one is the 2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov): O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13- # RDKit 2015.03.1pre InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15? # cactus.nci.nih.gov O=C(/N=c1\[nH]c(-c2n2)cs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1 InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17- InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21? CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)[nH]1 InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16- InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19? COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)s1 InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14- InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17? COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)sc1C(C)C InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16- InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20? CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2c2)C(=O)/N=c2\[nH]ncs2)CC1 InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1 InChI=1S/C21H36N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h14-18,22H,3-13H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1 If you look in the formula layer for the InChIs from PubChem, you will see that they all have *way* too many H atoms. I think there's something about the structures that is confusing the pubchem/cactvs conversion code. Compare these two outputs. Aromatic form: http://cactus.nci.nih.gov/chemical/structure/O=C(N=c1[nH]ncs1)C1CCC(Cn2cnc3c3c2=O)CC1/stdinchi produces: InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24) Kekule form: http://cactus.nci.nih.gov/chemical/structure/O=C(/N=C1/[NH]N=CS1)[C@H]1CC[C@H](CN2C=NC3=CC=CC=C3C2=O)CC1/stdinchi produces: InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13- In fact, converting the 5 membered ring to kekule form is enough: http://cactus.nci.nih.gov/chemical/structure/O=C(N=C1[NH]N=CS1)C1CCC(Cn2cnc3c3c2=O)CC1/stdinchi produces: InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24) This can't be true. We can further simplify things to track down the problem: http://cactus.nci.nih.gov/chemical/structure/N=c1[nH]ncs1/stdinchi InChI=1S/C2H5N3S/c3-2-5-4-1-6-2/h4H,1H2,(H2,3,5) vs http://cactus.nci.nih.gov/chemical/structure/O=c1[nH]ncs1/stdinchi InChI=1S/C2H2N2OS/c5-2-4-3-1-6-2/h1H,(H,4,5) It seems to be the exocyclic bond to an atom with Hs. This is ok: http://cactus.nci.nih.gov/chemical/structure/O=c1occo1/stdinchi InChI=1S/C3H2O3/c4-3-5-1-2-6-3/h1-2H but both of these are wrong: http://cactus.nci.nih.gov/chemical/structure/N=c1occo1/stdinchi InChI=1S/C3H5NO2/c4-3-5-1-2-6-3/h4H,1-2H2 http://cactus.nci.nih.gov/chemical/structure/C=c1occo1/stdinchi InChI=1S/C4H6O2/c1-4-5-2-3-6-4/h1-3H2 I'm pretty sure that this is not the RDKit doing the wrong
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
The InChIs have me confused. I'm going to simplify the below by just showing the input SMILES, the current (=master) RDKit InChI and the PubChem InChI On Mon, Feb 23, 2015 at 10:54 AM, JP jeanpaul.ebe...@inhibox.com wrote: Here is the list (first inchi is the 2014_09_2, second one is the 2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov): O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13- # RDKit 2015.03.1pre InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15? # cactus.nci.nih.gov O=C(/N=c1\[nH]c(-c2n2)cs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1 InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17- InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21? CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)[nH]1 InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16- InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19? COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)s1 InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14- InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17? COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)sc1C(C)C InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16- InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20? CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2c2)C(=O)/N=c2\[nH]ncs2)CC1 InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1 InChI=1S/C21H36N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h14-18,22H,3-13H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1 If you look in the formula layer for the InChIs from PubChem, you will see that they all have *way* too many H atoms. I think there's something about the structures that is confusing the pubchem/cactvs conversion code. Compare these two outputs. Aromatic form: http://cactus.nci.nih.gov/chemical/structure/O=C(N=c1[nH]ncs1)C1CCC(Cn2cnc3c3c2=O)CC1/stdinchi produces: InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24) Kekule form: http://cactus.nci.nih.gov/chemical/structure/O=C(/N=C1/[NH]N=CS1)[C@H]1CC[C@H](CN2C=NC3=CC=CC=C3C2=O)CC1/stdinchi produces: InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13- In fact, converting the 5 membered ring to kekule form is enough: http://cactus.nci.nih.gov/chemical/structure/O=C(N=C1[NH]N=CS1)C1CCC(Cn2cnc3c3c2=O)CC1/stdinchi produces: InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24) This can't be true. We can further simplify things to track down the problem: http://cactus.nci.nih.gov/chemical/structure/N=c1[nH]ncs1/stdinchi InChI=1S/C2H5N3S/c3-2-5-4-1-6-2/h4H,1H2,(H2,3,5) vs http://cactus.nci.nih.gov/chemical/structure/O=c1[nH]ncs1/stdinchi InChI=1S/C2H2N2OS/c5-2-4-3-1-6-2/h1H,(H,4,5) It seems to be the exocyclic bond to an atom with Hs. This is ok: http://cactus.nci.nih.gov/chemical/structure/O=c1occo1/stdinchi InChI=1S/C3H2O3/c4-3-5-1-2-6-3/h1-2H but both of these are wrong: http://cactus.nci.nih.gov/chemical/structure/N=c1occo1/stdinchi InChI=1S/C3H5NO2/c4-3-5-1-2-6-3/h4H,1-2H2 http://cactus.nci.nih.gov/chemical/structure/C=c1occo1/stdinchi InChI=1S/C4H6O2/c1-4-5-2-3-6-4/h1-3H2 I'm pretty sure that this is not the RDKit doing the wrong thing. @Markus: what would be the best way to report this to the NCI CADD guys? -greg -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
Well, the http://cactus.nci.nih.gov/chemical/structure/ site is my baby which I had to leave behind 1 1/2 years ago (I am not with NIH anymore). Igor who replied in this thread was also involved in some parts of it. Traffic on this cactus service is between 5 to 10 million requests per month - so I think the service survived your attack ;-) And I am not saying it is perfect, it just provides another implementation to double-check things in question. It has the CACTVS chemoinformatic toolkit as chemistry backend which I think is well-tested. Markus On Mon, Feb 23, 2015 at 10:54 AM, JP jeanpaul.ebe...@inhibox.com wrote: Ok so I got out my test set of 6,940,083 molecules. First, I generated the inchi using 2014_09_2. I then checked out (and built) the master (with Greg's latest commits) from github and regenerated the inchis for all these molecules. 3,257 molecules (of 6,940,083) gave me a different inchis between the current production version and the development (github) one. For these 3,257 molecules I hammered the http://cactus.nci.nih.gov/chemical/structure/%s/stdinchi site and assumed this to be the 'correct' inchi (those great guys will have an interesting spike in their web traffic last Fri evening). In 6 (out of 3,257) cases we get different Inchis from cactus.nci.nih.gov vs RDKit github development version (2015.03.1pre). Here is the list (first inchi is the 2014_09_2, second one is the 2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov): O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1 MPQBIWRBISQCLJ-BETUJISGSA-N MPQBIWRBISQCLJ-JOCQHMNTSA-N InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13+ # RDKit 2014_09_2 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13- # RDKit 2015.03.1pre InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15? # cactus.nci.nih.gov O=C(/N=c1\[nH]c(-c2n2)cs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1 CZKXHWCYFFXKGH-CALCHBBNSA-N CZKXHWCYFFXKGH-QAQDUYKDSA-N InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17+ InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17- InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21? CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)[nH]1 GAXCPQSXDNGSQV-IYBDPMFKSA-N GAXCPQSXDNGSQV-WKILWMFISA-N InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16+ InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16- InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19? COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)s1 YVZJPKUMKXPZTK-OKILXGFUSA-N YVZJPKUMKXPZTK-HDJSIYSDSA-N InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14+ InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14- InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17? COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)sc1C(C)C KNDSLDLCZNAXPK-IYBDPMFKSA-N KNDSLDLCZNAXPK-WKILWMFISA-N InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16+ InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16- InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20? CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2c2)C(=O)/N=c2\[nH]ncs2)CC1 OKTRHZCAACPPLC-FGTMMUONSA-N OKTRHZCAACPPLC-KZNAEPCWSA-N InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17+,18-/m1/s1
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
On 02/23/2015 06:12 AM, Markus Sitzmann wrote: And I am not saying it is perfect, it just provides another implementation to double-check things in question. It has the CACTVS chemoinformatic toolkit as chemistry backend which I think is well-tested. On Mon, Feb 23, 2015 at 10:54 AM, JP jeanpaul.ebe...@inhibox.com wrote: 3,257 molecules (of 6,940,083) gave me a different inchis between the current production version and the development (github) one. Just as another data point, out of 14 metabolites that's gone through my scripts so far: 1. cis-vaccenic acid, PubChem CID 5282761: InChI produced by RDKit 2014.09.2 differs from that from OpenBabel. InChI from PubChem's SDF agrees with the latter. (RDKit's ends with /b8-7+, OB PC: 7-.) InChI code does spit out undefined stereochemistry warnings for OB's we don't need no C.I.P., it's sooo last century stereo -- and then includes the stereo layer in the output anyway. (Though in this case PubChem's presumably OpenEye stereo seems to agree with OB and not RDKit.) 2. 5,10,15,20-Tetraphenyl-21H,23H-porphine zinc, PubChem CID 3580039: InChI in the SDF ends at the /q, both RDKit and OpenBabel add /b layer. They all agree, though. 14 data points is nowhere near enough for meaningful conclusions, but still... 14% won't match the plain string comparison that most searches do and 7% won't match the clever InChI-aware comparison search, assuming it's implemented anywhere. And then you get .05% between different versions of the same software... -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
Ok so I got out my test set of 6,940,083 molecules. First, I generated the inchi using 2014_09_2. I then checked out (and built) the master (with Greg's latest commits) from github and regenerated the inchis for all these molecules. 3,257 molecules (of 6,940,083) gave me a different inchis between the current production version and the development (github) one. For these 3,257 molecules I hammered the http://cactus.nci.nih.gov/chemical/structure/%s/stdinchi site and assumed this to be the 'correct' inchi (those great guys will have an interesting spike in their web traffic last Fri evening). In 6 (out of 3,257) cases we get different Inchis from cactus.nci.nih.gov vs RDKit github development version (2015.03.1pre). Here is the list (first inchi is the 2014_09_2, second one is the 2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov): O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1 MPQBIWRBISQCLJ-BETUJISGSA-N MPQBIWRBISQCLJ-JOCQHMNTSA-N InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13+ # RDKit 2014_09_2 InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13- # RDKit 2015.03.1pre InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15? # cactus.nci.nih.gov O=C(/N=c1\[nH]c(-c2n2)cs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1 CZKXHWCYFFXKGH-CALCHBBNSA-N CZKXHWCYFFXKGH-QAQDUYKDSA-N InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17+ InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17- InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21? CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)[nH]1 GAXCPQSXDNGSQV-IYBDPMFKSA-N GAXCPQSXDNGSQV-WKILWMFISA-N InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16+ InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16- InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19? COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)s1 YVZJPKUMKXPZTK-OKILXGFUSA-N YVZJPKUMKXPZTK-HDJSIYSDSA-N InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14+ InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14- InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17? COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)sc1C(C)C KNDSLDLCZNAXPK-IYBDPMFKSA-N KNDSLDLCZNAXPK-WKILWMFISA-N InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16+ InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16- InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20? CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2c2)C(=O)/N=c2\[nH]ncs2)CC1 OKTRHZCAACPPLC-FGTMMUONSA-N OKTRHZCAACPPLC-KZNAEPCWSA-N InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17+,18-/m1/s1 InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1 InChI=1S/C21H36N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h14-18,22H,3-13H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1 I have looked at these molecules in MarvinSketch to try to figure out why different inchis are being generated. Perhaps there is a problem in RDKit which is always detecting one of the rings as aromatic (the Inchi doesn't seem to agree on the aromaticity). I hope this is helpful. JP - Jean-Paul Ebejer Early Stage Researcher
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
Well, at least you said something important: conversion of InChI to molecules is something that's not in general guaranteed to work perfectly - and this is by design like this because InChI is an identifier, not a molecule representation. Unfortunately, many people seemed to forget about this :-) On Thu, Feb 19, 2015 at 6:59 AM, Greg Landrum greg.land...@gmail.com wrote: On Wed, Feb 18, 2015 at 7:01 PM, Igor Filippov igor.v.filip...@gmail.com wrote: update the bug report and work on tracking down the wrong problem That's how I sometimes do it too... ;) I'll leave it as an exercise to the reader to decide if that was intentional, the fault of auto-correct, or just because it had been a long day. ;-) -greg -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
On 02/19/2015 01:24 PM, Igor Filippov wrote: Markus also spelled out for you different variations for context in the same exchange. Do different tautomers represent different chemicals or the same one? Read the thread. Do face recognition identifiers even approach the accuracy of InChI identifiers? At this point it's the other way around: facial recognition success rates are anywhere between 25% and 95%. How many of the existing molecules can be represented by InChI? (I'll give you a hint: none longer than MAX_ATOMS defined in ichisize.h.) If you still insist that there could be only one singular definition of unique in the universe then I am afraid this definition has no meaning and you are alone...ehm... unique.. in using it. When you have a different analytical engine built on different logic, then you can have your different definition of unique in the context of a computer system. As long as you're using a digital computer you're using the same simplistic integers, boolean algebra, and discrete math. That's the objective reality, it won't change no matter how much you can argue faces in universe. Similarly, when you get yourself a different English language, then you can have a different definition of unique as an English word. In this version of English, go buy a dictionary. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
Dmitri, As others before me I tried to explain to you that the simplistic definition of unique molecule is rather naive and is neither useful nor reflects reality. Perhaps my explanation is woefully inadequate to convey the meaning I would like to convey but that is no excuse to reply with rudeness and condescension. If you are unable to present your arguments in a civilized manner then please cease this discussion. Best regards, Igor On Thu, Feb 19, 2015 at 2:53 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 02/19/2015 01:24 PM, Igor Filippov wrote: Markus also spelled out for you different variations for context in the same exchange. Do different tautomers represent different chemicals or the same one? Read the thread. Do face recognition identifiers even approach the accuracy of InChI identifiers? At this point it's the other way around: facial recognition success rates are anywhere between 25% and 95%. How many of the existing molecules can be represented by InChI? (I'll give you a hint: none longer than MAX_ATOMS defined in ichisize.h.) If you still insist that there could be only one singular definition of unique in the universe then I am afraid this definition has no meaning and you are alone...ehm... unique.. in using it. When you have a different analytical engine built on different logic, then you can have your different definition of unique in the context of a computer system. As long as you're using a digital computer you're using the same simplistic integers, boolean algebra, and discrete math. That's the objective reality, it won't change no matter how much you can argue faces in universe. Similarly, when you get yourself a different English language, then you can have a different definition of unique as an English word. In this version of English, go buy a dictionary. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
On Thu, Feb 19, 2015 at 10:11 AM, Markus Sitzmann markus.sitzm...@gmail.com wrote: Well, at least you said something important: conversion of InChI to molecules is something that's not in general guaranteed to work perfectly - and this is by design like this because InChI is an identifier, not a molecule representation. Unfortunately, many people seemed to forget about this :-) Yes, yes they do. -greg -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
On 2015-02-19 05:58, Greg Landrum wrote: On Thu, Feb 19, 2015 at 10:11 AM, Markus Sitzmann markus.sitzm...@gmail.com mailto:markus.sitzm...@gmail.com wrote: Well, at least you said something important: conversion of InChI to molecules is something that's not in general guaranteed to work perfectly - and this is by design like this because InChI is an identifier, not a molecule representation. Unfortunately, many people seemed to forget about this :-) Yes, yes they do. Well unfortunately inchi states they're a 'unique identifier' which means there must be 1 inchi for 1 molecule and it *should* work perfectly. And then they say the only required 'layer' is the formula which means a) it's not unique and b) how is InChi=formula better than just formula? D'uh. Dimitri -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
No, a chemical structure must calculate a unique InChI, but a InChI might cover more then one chemical structure (because their are molecules that can be described by more than one chemical structure). And a chemical formula might be the most accurate (unique) description you have for a molecule (admittedly, unlikely today), however, that is why the InChI is layered. Ba adding and removing layers, InChI allows you how precisely you want to define uniqueness - that is important with molecules because there is no precise, universally valid definition for unique molecule. On Thu, Feb 19, 2015 at 2:06 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 2015-02-19 05:58, Greg Landrum wrote: On Thu, Feb 19, 2015 at 10:11 AM, Markus Sitzmann markus.sitzm...@gmail.com mailto:markus.sitzm...@gmail.com wrote: Well, at least you said something important: conversion of InChI to molecules is something that's not in general guaranteed to work perfectly - and this is by design like this because InChI is an identifier, not a molecule representation. Unfortunately, many people seemed to forget about this :-) Yes, yes they do. Well unfortunately inchi states they're a 'unique identifier' which means there must be 1 inchi for 1 molecule and it *should* work perfectly. And then they say the only required 'layer' is the formula which means a) it's not unique and b) how is InChi=formula better than just formula? D'uh. Dimitri -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
On 02/19/2015 12:29 PM, Igor Filippov wrote: No. there's only one definition if unique This is way too simplistic. The definition of unique depends on the application. There is a context to this thread. The application was spelled out in Markus and Greg's Well, at least you said something important: conversion of InChI to molecules is something that's not in general guaranteed to work perfectly - and this is by design like this because InChI is an identifier, not a molecule representation. Unfortunately, many people seemed to forget about this :-) Yes, yes they do. Persons' faces and other disciplines have nothing to do with it. (Note, however, the face recognition people actually get simplistic integer numbers so their unique keys tend to be based on well-defined metrics and factor in isometries and other fun math stuff. Unlike IUPAC and InChI.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
No. there's only one definition if unique This is way too simplistic. The definition of unique depends on the application. Not only in chemistry but other fields as well. The way you just defined unique is appropriate for integer numbers, but not everything is quite so trivial. Is human face unique? What about picture of the same person taken at 5, 15, 25, 45 years of age? Is it the same picture or completely different? Faces of identical twins? The uniqueness is defined by what you need to accomplish, not by some god-given attribute of the object, otherwise no two things are the same and unique loses all meaning. Best, Igor On Thu, Feb 19, 2015 at 12:54 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 02/19/2015 08:54 AM, Markus Sitzmann wrote: A database can have several definitions of unique for anything - a structure database can have this, too. If you have a chemical compound which can form 10 different tautomers, you can represent the compound by 10 chemical structures (it is still the same compound, though). So, if you define uniqueness on basis of chemical compound, you have one db entry and this one entry has a single (tatuomer-sensitive) InChI covering 10 chemical structures; if you define uniqueness on basis of tautomers/chemical structures (because all are relevant, for instance, in NMR spectrosopy) you have (and want) 10 database entries, each with a single (tautomer-sensitive) InChI. Two definitions of unique. No. there's only one definition if unique: unique key is a set of attributes that is guaranteed to be unique for each entity. The relationship between the key and the entity is symmetric: if x is the inchi string for compound y then y is the compound for inchi string x. If follows that if y is the compound for inchi string x, and z is also the compound for inchi string x, then x is not unique. What you have is two definitions of chemical compound. You can, in your database, define 10 different tautomers as ein compound, ein unique key. Your database will be useless for any number of applications. You can define 10 different tautomers as 10 different compounds with 10 different unique keys. Your database will be too heavy for any number of applications. It's your database. What you can't do is redefine unique to mean two things at once: it's not your discrete math. Sorry. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
Ok, I just checked in a set of changes to address the problems with InChI generation and InChI -mol conversion for 3-coordinate atoms. There are some notes and pointers to the code changes here: https://github.com/rdkit/rdkit/issues/437 Please note the final comment there: JP's original example now gives a correct InChI, and it looks like that InChI is correctly converted into a SMILES, but the resulting SMILES is not properly canonicalized. G. I will continue to look at that. In the meantime, I would be extremely grateful if someone else could take a look at the testing code I added: https://github.com/rdkit/rdkit/blob/ca0c4956765952e8c0556885c6ac8e62bac197e1/External/INCHI-API/test.cpp#L256 and let me know if they see any problems with the expected InChIs I'm using. Best, -greg On Wed, Feb 18, 2015 at 4:58 PM, Markus Sitzmann markus.sitzm...@gmail.com wrote: I agree with John, the InChI for mol1 and mol2 should be http://cactus.nci.nih.gov/chemical/structure/O=C(NCCc1c1)[C@H]1CC[C@H](Cn2c(O)nc3c3c2=O)CC1/stdinchi InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19- So the + at the end should be a - Markus On Wed, Feb 18, 2015 at 2:53 PM, John M john.wilkinson...@gmail.com wrote: Hi Greg, I believe it's an RDKitMol - InChI issue rather than InChI - RDKitMol. The correct InChI (below) is different from that in the iPython listing. InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19- J Regards, John W May john.wilkinson...@gmail.com On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com wrote: JP, Looks like that's a bug in the way ring stereochemistry is handled while translating the InChI back into an molecule. It's reproducible with a small example: In [1]: from rdkit import Chem In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1) In [3]: Chem.MolToSmiles(mol1,True) Out[3]: 'C[C@H]1CC[C@H](O)CC1' In [4]: inchi = Chem.MolToInchi(mol1) In [5]: mol2 = Chem.MolFromInchi(inchi) In [6]: Chem.MolToSmiles(mol2,True) Out[6]: 'C[C@H]1CC[C@@H](O)CC1' Conversion of InChI to molecules is something that's not in general guaranteed to work perfectly, but I will go ahead and create a bug report. -greg On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com wrote: Hi there, I have a question for the 3D enabled of you (I wish the world looked like GTA2 !) I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I think is changing the stereochemistry of the molecule. I have 12 example-pairs where this happens (but all very structurally similar). I don't care much that the last rdkit molecule is a different tautomer than the starting one - but if this is the case the stereochemistry should still be conserved, no? I did an ipython notebook (most useful tool of the decade after RDKit?) gist here: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt I appreciate if anyone could shed some light. I'd just like to understand. Thank you for your time! - JP -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
A general comment/request: One of the great things about the RDKit community, including this mailing list, is how supportive and helpful people are. In contrast to many online communities it's a friendly place and I think that's great. This thread is starting to get a bit aggressive in tone... please keep an eye on that. -greg -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
On Thu, Feb 19, 2015 at 6:54 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 02/19/2015 08:54 AM, Markus Sitzmann wrote: A database can have several definitions of unique for anything - a structure database can have this, too. If you have a chemical compound which can form 10 different tautomers, you can represent the compound by 10 chemical structures (it is still the same compound, though). So, if you define uniqueness on basis of chemical compound, you have one db entry and this one entry has a single (tatuomer-sensitive) InChI covering 10 chemical structures; if you define uniqueness on basis of tautomers/chemical structures (because all are relevant, for instance, in NMR spectrosopy) you have (and want) 10 database entries, each with a single (tautomer-sensitive) InChI. Two definitions of unique. No. there's only one definition if unique: unique key is a set of attributes that is guaranteed to be unique for each entity. The relationship between the key and the entity is symmetric: if x is the inchi string for compound y then y is the compound for inchi string x. This could be true if the InChI algorithm just took the input structure you provided and returned a string for it. That's not, however, what it does. The algorithm first standardizes the molecule: modifying the tautomeric state, moving charges around, etc., then generates the InChI string. Assuming you use the full standard InChI string, what you end up with is a function that maps every molecule to a unique InChI. You can also map every InChI back to a unique molecule (though, as discussed above and elsewhere, this can be somewhat fraught with danger), but the unique molecule that the InChI corresponds to is not necessarily the same as the input molecule. -greg -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
A database can have several definitions of unique for anything - a structure database can have this, too. If you have a chemical compound which can form 10 different tautomers, you can represent the compound by 10 chemical structures (it is still the same compound, though). So, if you define uniqueness on basis of chemical compound, you have one db entry and this one entry has a single (tatuomer-sensitive) InChI covering 10 chemical structures; if you define uniqueness on basis of tautomers/chemical structures (because all are relevant, for instance, in NMR spectrosopy) you have (and want) 10 database entries, each with a single (tautomer-sensitive) InChI. Two definitions of unique. So my sentence still stands: a chemical structure must calculate a unique InChI, but a InChI might cover more then one chemical structure. On Thu, Feb 19, 2015 at 3:37 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 2015-02-19 07:27, Markus Sitzmann wrote: No, a chemical structure must calculate a unique InChI, but a InChI might cover more then one chemical structure Heh. I could swear last time I read the description it specifically mentioned databases. In the database context 'unique' has a specific well-defined meaning and that is *not* 'more than one'. Now I don't see it in the official blurbs, only pikiwedia mentions databases. ... there is no precise, universally valid definition for unique molecule. On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. Works for 'undefined figures', too. Dimitri -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
On 2015-02-19 07:27, Markus Sitzmann wrote: No, a chemical structure must calculate a unique InChI, but a InChI might cover more then one chemical structure Heh. I could swear last time I read the description it specifically mentioned databases. In the database context 'unique' has a specific well-defined meaning and that is *not* 'more than one'. Now I don't see it in the official blurbs, only pikiwedia mentions databases. ... there is no precise, universally valid definition for unique molecule. On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. Works for 'undefined figures', too. Dimitri -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
On 02/19/2015 08:54 AM, Markus Sitzmann wrote: A database can have several definitions of unique for anything - a structure database can have this, too. If you have a chemical compound which can form 10 different tautomers, you can represent the compound by 10 chemical structures (it is still the same compound, though). So, if you define uniqueness on basis of chemical compound, you have one db entry and this one entry has a single (tatuomer-sensitive) InChI covering 10 chemical structures; if you define uniqueness on basis of tautomers/chemical structures (because all are relevant, for instance, in NMR spectrosopy) you have (and want) 10 database entries, each with a single (tautomer-sensitive) InChI. Two definitions of unique. No. there's only one definition if unique: unique key is a set of attributes that is guaranteed to be unique for each entity. The relationship between the key and the entity is symmetric: if x is the inchi string for compound y then y is the compound for inchi string x. If follows that if y is the compound for inchi string x, and z is also the compound for inchi string x, then x is not unique. What you have is two definitions of chemical compound. You can, in your database, define 10 different tautomers as ein compound, ein unique key. Your database will be useless for any number of applications. You can define 10 different tautomers as 10 different compounds with 10 different unique keys. Your database will be too heavy for any number of applications. It's your database. What you can't do is redefine unique to mean two things at once: it's not your discrete math. Sorry. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
update the bug report and work on tracking down the wrong problem That's how I sometimes do it too... ;) Igor On Wed, Feb 18, 2015 at 12:35 PM, Greg Landrum greg.land...@gmail.com wrote: Yep, you guys are right. I diagnosed that too quickly. Thanks for pointing out the mistake. I'll update the bug report and work on tracking down the wrong problem -greg On Wed, Feb 18, 2015 at 4:58 PM, Markus Sitzmann markus.sitzm...@gmail.com wrote: I agree with John, the InChI for mol1 and mol2 should be http://cactus.nci.nih.gov/chemical/structure/O=C(NCCc1c1)[C@H]1CC[C@H](Cn2c(O)nc3c3c2=O)CC1/stdinchi InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19- So the + at the end should be a - Markus On Wed, Feb 18, 2015 at 2:53 PM, John M john.wilkinson...@gmail.com wrote: Hi Greg, I believe it's an RDKitMol - InChI issue rather than InChI - RDKitMol. The correct InChI (below) is different from that in the iPython listing. InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19- J Regards, John W May john.wilkinson...@gmail.com On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com wrote: JP, Looks like that's a bug in the way ring stereochemistry is handled while translating the InChI back into an molecule. It's reproducible with a small example: In [1]: from rdkit import Chem In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1) In [3]: Chem.MolToSmiles(mol1,True) Out[3]: 'C[C@H]1CC[C@H](O)CC1' In [4]: inchi = Chem.MolToInchi(mol1) In [5]: mol2 = Chem.MolFromInchi(inchi) In [6]: Chem.MolToSmiles(mol2,True) Out[6]: 'C[C@H]1CC[C@@H](O)CC1' Conversion of InChI to molecules is something that's not in general guaranteed to work perfectly, but I will go ahead and create a bug report. -greg On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com wrote: Hi there, I have a question for the 3D enabled of you (I wish the world looked like GTA2 !) I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I think is changing the stereochemistry of the molecule. I have 12 example-pairs where this happens (but all very structurally similar). I don't care much that the last rdkit molecule is a different tautomer than the starting one - but if this is the case the stereochemistry should still be conserved, no? I did an ipython notebook (most useful tool of the decade after RDKit?) gist here: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt I appreciate if anyone could shed some light. I'd just like to understand. Thank you for your time! - JP -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss --
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
Yep, you guys are right. I diagnosed that too quickly. Thanks for pointing out the mistake. I'll update the bug report and work on tracking down the wrong problem -greg On Wed, Feb 18, 2015 at 4:58 PM, Markus Sitzmann markus.sitzm...@gmail.com wrote: I agree with John, the InChI for mol1 and mol2 should be http://cactus.nci.nih.gov/chemical/structure/O=C(NCCc1c1)[C@H]1CC[C@H](Cn2c(O)nc3c3c2=O)CC1/stdinchi InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19- So the + at the end should be a - Markus On Wed, Feb 18, 2015 at 2:53 PM, John M john.wilkinson...@gmail.com wrote: Hi Greg, I believe it's an RDKitMol - InChI issue rather than InChI - RDKitMol. The correct InChI (below) is different from that in the iPython listing. InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19- J Regards, John W May john.wilkinson...@gmail.com On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com wrote: JP, Looks like that's a bug in the way ring stereochemistry is handled while translating the InChI back into an molecule. It's reproducible with a small example: In [1]: from rdkit import Chem In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1) In [3]: Chem.MolToSmiles(mol1,True) Out[3]: 'C[C@H]1CC[C@H](O)CC1' In [4]: inchi = Chem.MolToInchi(mol1) In [5]: mol2 = Chem.MolFromInchi(inchi) In [6]: Chem.MolToSmiles(mol2,True) Out[6]: 'C[C@H]1CC[C@@H](O)CC1' Conversion of InChI to molecules is something that's not in general guaranteed to work perfectly, but I will go ahead and create a bug report. -greg On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com wrote: Hi there, I have a question for the 3D enabled of you (I wish the world looked like GTA2 !) I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I think is changing the stereochemistry of the molecule. I have 12 example-pairs where this happens (but all very structurally similar). I don't care much that the last rdkit molecule is a different tautomer than the starting one - but if this is the case the stereochemistry should still be conserved, no? I did an ipython notebook (most useful tool of the decade after RDKit?) gist here: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt I appreciate if anyone could shed some light. I'd just like to understand. Thank you for your time! - JP -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
On Wed, Feb 18, 2015 at 7:01 PM, Igor Filippov igor.v.filip...@gmail.com wrote: update the bug report and work on tracking down the wrong problem That's how I sometimes do it too... ;) I'll leave it as an exercise to the reader to decide if that was intentional, the fault of auto-correct, or just because it had been a long day. ;-) -greg -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
I agree with John, the InChI for mol1 and mol2 should be http://cactus.nci.nih.gov/chemical/structure/O=C(NCCc1c1)[C@H]1CC[C@H](Cn2c(O)nc3c3c2=O)CC1/stdinchi InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19- So the + at the end should be a - Markus On Wed, Feb 18, 2015 at 2:53 PM, John M john.wilkinson...@gmail.com wrote: Hi Greg, I believe it's an RDKitMol - InChI issue rather than InChI - RDKitMol. The correct InChI (below) is different from that in the iPython listing. InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19- J Regards, John W May john.wilkinson...@gmail.com On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com wrote: JP, Looks like that's a bug in the way ring stereochemistry is handled while translating the InChI back into an molecule. It's reproducible with a small example: In [1]: from rdkit import Chem In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1) In [3]: Chem.MolToSmiles(mol1,True) Out[3]: 'C[C@H]1CC[C@H](O)CC1' In [4]: inchi = Chem.MolToInchi(mol1) In [5]: mol2 = Chem.MolFromInchi(inchi) In [6]: Chem.MolToSmiles(mol2,True) Out[6]: 'C[C@H]1CC[C@@H](O)CC1' Conversion of InChI to molecules is something that's not in general guaranteed to work perfectly, but I will go ahead and create a bug report. -greg On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com wrote: Hi there, I have a question for the 3D enabled of you (I wish the world looked like GTA2 !) I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I think is changing the stereochemistry of the molecule. I have 12 example-pairs where this happens (but all very structurally similar). I don't care much that the last rdkit molecule is a different tautomer than the starting one - but if this is the case the stereochemistry should still be conserved, no? I did an ipython notebook (most useful tool of the decade after RDKit?) gist here: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt I appreciate if anyone could shed some light. I'd just like to understand. Thank you for your time! - JP -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !
Hi Greg, I believe it's an RDKitMol - InChI issue rather than InChI - RDKitMol. The correct InChI (below) is different from that in the iPython listing. InChI=1S/C24H27N3O3/c28-22(25-15-14-17-6-2-1-3-7-17)19-12-10-18(11-13-19)16-27-23(29)20-8-4-5-9-21(20)26-24(27)30/h1-9,18-19H,10-16H2,(H,25,28)(H,26,30)/t18-,19- J Regards, John W May john.wilkinson...@gmail.com On 18 February 2015 at 04:57, Greg Landrum greg.land...@gmail.com wrote: JP, Looks like that's a bug in the way ring stereochemistry is handled while translating the InChI back into an molecule. It's reproducible with a small example: In [1]: from rdkit import Chem In [2]: mol1 = Chem.MolFromSmiles(C[C@H]1CC[C@H](O)CC1) In [3]: Chem.MolToSmiles(mol1,True) Out[3]: 'C[C@H]1CC[C@H](O)CC1' In [4]: inchi = Chem.MolToInchi(mol1) In [5]: mol2 = Chem.MolFromInchi(inchi) In [6]: Chem.MolToSmiles(mol2,True) Out[6]: 'C[C@H]1CC[C@@H](O)CC1' Conversion of InChI to molecules is something that's not in general guaranteed to work perfectly, but I will go ahead and create a bug report. -greg On Tue, Feb 17, 2015 at 2:50 PM, JP jeanpaul.ebe...@inhibox.com wrote: Hi there, I have a question for the 3D enabled of you (I wish the world looked like GTA2 !) I am seeing a case of an RDKit mol - Inchi - RDKit mol, that I think is changing the stereochemistry of the molecule. I have 12 example-pairs where this happens (but all very structurally similar). I don't care much that the last rdkit molecule is a different tautomer than the starting one - but if this is the case the stereochemistry should still be conserved, no? I did an ipython notebook (most useful tool of the decade after RDKit?) gist here: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt I appreciate if anyone could shed some light. I'd just like to understand. Thank you for your time! - JP -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss