Re: [Rdkit-devel] SDF String Generation Include Stereo information
Hi Greg, Hi Paolo, Thank you for that amazingly fast response! Seems like the bug in NAOMI is the most likely scenario here. Thank you for the thorough explanation and your time. I'll post new information on this if anything unexpected comes up. Kind regards, Emanuel Am Mi., 29. Juli 2020 um 13:56 Uhr schrieb Greg Landrum < greg.land...@gmail.com>: > Hi Emanuel, > > The chirality bit doesn't have anything to do with double bond > stereochemistry.[1] So that's not what's going on here > > The RDKit has the ability to pass the mol block provided directly to the > InChI code without interpreting it. I believe that the ChEMBL team is using > that to generate InChIs. In any case, where I use that to pass the molblock > downloaded from ChEMBL ( > https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL6223.sdf) to the > InChI code I get the same InChI that is found in ChEMBL. > > In this particular case I believe the bug may be in the NAOMI code. > > -greg > [1] According to the documentation it tells you about whether or not a > molfile with specified atomic stereochemistry represents a single > stereoisomer (the one drawn) or that only the relative configurations of > the specified stereocenters is known and that the structure is either a > single diastereomer or a mixture of the two stereoisomers. > > On Wed, Jul 29, 2020 at 11:17 AM Emanuel Ehmki > wrote: > >> Dear All, >> >> I am currently working with the RDKit generated SDF String that is stored >> in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26. >> My workflow is: >> >>- pull SDF (V2000) from SQL table >>- generate internal molecule representation (NAOMI ChemBio tool-kit >>if that means anything to you) >>- generate InChI string and key from molecule >>- compare with InChI string and key that are stored in the ChEMBL >>database >> >> When comparing the InChI string for the molecule with the id CHEMBL6223, >> I get two differing strings due to different stereochemistry (last >> characters) >> >> ChEMBL >> InChI: >> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ >>*b12-10+* >> NAOMI InChI >> : >> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ >> *b12-10-* >> >> While researching why that happens I realized that the SDF string doesn't >> make use of the chirality bit that can be set in the counts line. >> When digging deeper I found the disabled block in the MolFileWriter.cpp >> -> MolToMolBlock function >> >> https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395 >> >> Do I understand correctly that RDKit does not store any information about >> chirality in V2000 and includes chiral information only in V3000 SDF format? >> >> Does anyone know when ChEMBL might switch to that version? >> >> Kind regards, >> Emanuel >> ___ >> Rdkit-devel mailing list >> Rdkit-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-devel >> > ___ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel
Re: [Rdkit-devel] SDF String Generation Include Stereo information
Hi Emanuel, The chirality bit doesn't have anything to do with double bond stereochemistry.[1] So that's not what's going on here The RDKit has the ability to pass the mol block provided directly to the InChI code without interpreting it. I believe that the ChEMBL team is using that to generate InChIs. In any case, where I use that to pass the molblock downloaded from ChEMBL ( https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL6223.sdf) to the InChI code I get the same InChI that is found in ChEMBL. In this particular case I believe the bug may be in the NAOMI code. -greg [1] According to the documentation it tells you about whether or not a molfile with specified atomic stereochemistry represents a single stereoisomer (the one drawn) or that only the relative configurations of the specified stereocenters is known and that the structure is either a single diastereomer or a mixture of the two stereoisomers. On Wed, Jul 29, 2020 at 11:17 AM Emanuel Ehmki wrote: > Dear All, > > I am currently working with the RDKit generated SDF String that is stored > in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26. > My workflow is: > >- pull SDF (V2000) from SQL table >- generate internal molecule representation (NAOMI ChemBio tool-kit if >that means anything to you) >- generate InChI string and key from molecule >- compare with InChI string and key that are stored in the ChEMBL >database > > When comparing the InChI string for the molecule with the id CHEMBL6223, I > get two differing strings due to different stereochemistry (last characters) > > ChEMBL > InChI: > InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ >*b12-10+* > NAOMI InChI > : > InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ > *b12-10-* > > While researching why that happens I realized that the SDF string doesn't > make use of the chirality bit that can be set in the counts line. > When digging deeper I found the disabled block in the MolFileWriter.cpp -> > MolToMolBlock function > > https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395 > > Do I understand correctly that RDKit does not store any information about > chirality in V2000 and includes chiral information only in V3000 SDF format? > > Does anyone know when ChEMBL might switch to that version? > > Kind regards, > Emanuel > ___ > Rdkit-devel mailing list > Rdkit-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-devel > ___ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel
Re: [Rdkit-devel] SDF String Generation Include Stereo information
Hi Emanuel, the RDKit perceives double bond stereochemistry on read, and encodes on write, based on 2D coordinates in the molblock, I put together an example gist here: https://gist.github.com/ptosco/9ffae8814e84bcf189da7663775748e5 I hope that addresses your question , if not feel free to get back to me. Cheers, p. On Wed, Jul 29, 2020 at 11:17 AM Emanuel Ehmki wrote: > Dear All, > > I am currently working with the RDKit generated SDF String that is stored > in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26. > My workflow is: > >- pull SDF (V2000) from SQL table >- generate internal molecule representation (NAOMI ChemBio tool-kit if >that means anything to you) >- generate InChI string and key from molecule >- compare with InChI string and key that are stored in the ChEMBL >database > > When comparing the InChI string for the molecule with the id CHEMBL6223, I > get two differing strings due to different stereochemistry (last characters) > > ChEMBL > InChI: > InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ >*b12-10+* > NAOMI InChI > : > InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ > *b12-10-* > > While researching why that happens I realized that the SDF string doesn't > make use of the chirality bit that can be set in the counts line. > When digging deeper I found the disabled block in the MolFileWriter.cpp -> > MolToMolBlock function > > https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395 > > Do I understand correctly that RDKit does not store any information about > chirality in V2000 and includes chiral information only in V3000 SDF format? > > Does anyone know when ChEMBL might switch to that version? > > Kind regards, > Emanuel > ___ > Rdkit-devel mailing list > Rdkit-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-devel > ___ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel
[Rdkit-devel] SDF String Generation Include Stereo information
Dear All, I am currently working with the RDKit generated SDF String that is stored in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26. My workflow is: - pull SDF (V2000) from SQL table - generate internal molecule representation (NAOMI ChemBio tool-kit if that means anything to you) - generate InChI string and key from molecule - compare with InChI string and key that are stored in the ChEMBL database When comparing the InChI string for the molecule with the id CHEMBL6223, I get two differing strings due to different stereochemistry (last characters) ChEMBL InChI: InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ *b12-10+* NAOMI InChI : InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ *b12-10-* While researching why that happens I realized that the SDF string doesn't make use of the chirality bit that can be set in the counts line. When digging deeper I found the disabled block in the MolFileWriter.cpp -> MolToMolBlock function https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395 Do I understand correctly that RDKit does not store any information about chirality in V2000 and includes chiral information only in V3000 SDF format? Does anyone know when ChEMBL might switch to that version? Kind regards, Emanuel ___ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel