Dear All,

I am currently working with the RDKit generated SDF String that is stored
in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26.
My workflow is:

   - pull SDF (V2000) from SQL table
   - generate internal molecule representation (NAOMI ChemBio tool-kit if
   that means anything to you)
   - generate InChI string and key from molecule
   - compare with InChI string and key that are stored in the ChEMBL
   database

When comparing the InChI string for the molecule with the id CHEMBL6223, I
get two differing strings due to different stereochemistry (last characters)

ChEMBL
InChI: 
InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
   *b12-10+*
NAOMI InChI
: 
InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
    *b12-10-*

While researching why that happens I realized that the SDF string doesn't
make use of the chirality bit that can be set in the counts line.
When digging deeper I found the disabled block in the MolFileWriter.cpp ->
MolToMolBlock function
https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395

Do I understand correctly that RDKit does not store any information about
chirality in V2000 and includes chiral information only in V3000 SDF format?

Does anyone know when ChEMBL might switch to that version?

Kind regards,
Emanuel
_______________________________________________
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel

Reply via email to