Hi Greg, Hi Paolo,

Thank you for that amazingly fast response! Seems like the bug in NAOMI is
the most likely scenario here.
Thank you for the thorough explanation and your time. I'll post new
information on this if anything unexpected comes up.

Kind regards,
Emanuel

Am Mi., 29. Juli 2020 um 13:56 Uhr schrieb Greg Landrum <
greg.land...@gmail.com>:

> Hi Emanuel,
>
> The chirality bit doesn't have anything to do with double bond
> stereochemistry.[1] So that's not what's going on here
>
> The RDKit has the ability to pass the mol block provided directly to the
> InChI code without interpreting it. I believe that the ChEMBL team is using
> that to generate InChIs. In any case, where I use that to pass the molblock
> downloaded from ChEMBL (
> https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL6223.sdf) to the
> InChI code I get the same InChI that is found in ChEMBL.
>
> In this particular case I believe the bug may be in the NAOMI code.
>
> -greg
> [1] According to the documentation it tells you about whether or not a
> molfile with specified atomic stereochemistry represents a single
> stereoisomer  (the one drawn) or that only the relative configurations of
> the specified stereocenters is known and that the structure is either a
> single diastereomer or a mixture of the two stereoisomers.
>
> On Wed, Jul 29, 2020 at 11:17 AM Emanuel Ehmki <emanuel.eh...@gmail.com>
> wrote:
>
>> Dear All,
>>
>> I am currently working with the RDKit generated SDF String that is stored
>> in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26.
>> My workflow is:
>>
>>    - pull SDF (V2000) from SQL table
>>    - generate internal molecule representation (NAOMI ChemBio tool-kit
>>    if that means anything to you)
>>    - generate InChI string and key from molecule
>>    - compare with InChI string and key that are stored in the ChEMBL
>>    database
>>
>> When comparing the InChI string for the molecule with the id CHEMBL6223,
>> I get two differing strings due to different stereochemistry (last
>> characters)
>>
>> ChEMBL
>> InChI: 
>> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
>>    *b12-10+*
>> NAOMI InChI
>> : 
>> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
>>     *b12-10-*
>>
>> While researching why that happens I realized that the SDF string doesn't
>> make use of the chirality bit that can be set in the counts line.
>> When digging deeper I found the disabled block in the MolFileWriter.cpp
>> -> MolToMolBlock function
>>
>> https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395
>>
>> Do I understand correctly that RDKit does not store any information about
>> chirality in V2000 and includes chiral information only in V3000 SDF format?
>>
>> Does anyone know when ChEMBL might switch to that version?
>>
>> Kind regards,
>> Emanuel
>> _______________________________________________
>> Rdkit-devel mailing list
>> Rdkit-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>>
>
_______________________________________________
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel

Reply via email to