Re: [Rdkit-devel] SDF String Generation Include Stereo information

2020-07-31 Thread Emanuel Ehmki
Hi Greg, Hi Paolo,

Thank you for that amazingly fast response! Seems like the bug in NAOMI is
the most likely scenario here.
Thank you for the thorough explanation and your time. I'll post new
information on this if anything unexpected comes up.

Kind regards,
Emanuel

Am Mi., 29. Juli 2020 um 13:56 Uhr schrieb Greg Landrum <
greg.land...@gmail.com>:

> Hi Emanuel,
>
> The chirality bit doesn't have anything to do with double bond
> stereochemistry.[1] So that's not what's going on here
>
> The RDKit has the ability to pass the mol block provided directly to the
> InChI code without interpreting it. I believe that the ChEMBL team is using
> that to generate InChIs. In any case, where I use that to pass the molblock
> downloaded from ChEMBL (
> https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL6223.sdf) to the
> InChI code I get the same InChI that is found in ChEMBL.
>
> In this particular case I believe the bug may be in the NAOMI code.
>
> -greg
> [1] According to the documentation it tells you about whether or not a
> molfile with specified atomic stereochemistry represents a single
> stereoisomer  (the one drawn) or that only the relative configurations of
> the specified stereocenters is known and that the structure is either a
> single diastereomer or a mixture of the two stereoisomers.
>
> On Wed, Jul 29, 2020 at 11:17 AM Emanuel Ehmki 
> wrote:
>
>> Dear All,
>>
>> I am currently working with the RDKit generated SDF String that is stored
>> in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26.
>> My workflow is:
>>
>>- pull SDF (V2000) from SQL table
>>- generate internal molecule representation (NAOMI ChemBio tool-kit
>>if that means anything to you)
>>- generate InChI string and key from molecule
>>- compare with InChI string and key that are stored in the ChEMBL
>>database
>>
>> When comparing the InChI string for the molecule with the id CHEMBL6223,
>> I get two differing strings due to different stereochemistry (last
>> characters)
>>
>> ChEMBL
>> InChI: 
>> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
>>*b12-10+*
>> NAOMI InChI
>> : 
>> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
>> *b12-10-*
>>
>> While researching why that happens I realized that the SDF string doesn't
>> make use of the chirality bit that can be set in the counts line.
>> When digging deeper I found the disabled block in the MolFileWriter.cpp
>> -> MolToMolBlock function
>>
>> https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395
>>
>> Do I understand correctly that RDKit does not store any information about
>> chirality in V2000 and includes chiral information only in V3000 SDF format?
>>
>> Does anyone know when ChEMBL might switch to that version?
>>
>> Kind regards,
>> Emanuel
>> ___
>> Rdkit-devel mailing list
>> Rdkit-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>>
>
___
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel


Re: [Rdkit-devel] SDF String Generation Include Stereo information

2020-07-29 Thread Greg Landrum
Hi Emanuel,

The chirality bit doesn't have anything to do with double bond
stereochemistry.[1] So that's not what's going on here

The RDKit has the ability to pass the mol block provided directly to the
InChI code without interpreting it. I believe that the ChEMBL team is using
that to generate InChIs. In any case, where I use that to pass the molblock
downloaded from ChEMBL (
https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL6223.sdf) to the InChI
code I get the same InChI that is found in ChEMBL.

In this particular case I believe the bug may be in the NAOMI code.

-greg
[1] According to the documentation it tells you about whether or not a
molfile with specified atomic stereochemistry represents a single
stereoisomer  (the one drawn) or that only the relative configurations of
the specified stereocenters is known and that the structure is either a
single diastereomer or a mixture of the two stereoisomers.

On Wed, Jul 29, 2020 at 11:17 AM Emanuel Ehmki 
wrote:

> Dear All,
>
> I am currently working with the RDKit generated SDF String that is stored
> in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26.
> My workflow is:
>
>- pull SDF (V2000) from SQL table
>- generate internal molecule representation (NAOMI ChemBio tool-kit if
>that means anything to you)
>- generate InChI string and key from molecule
>- compare with InChI string and key that are stored in the ChEMBL
>database
>
> When comparing the InChI string for the molecule with the id CHEMBL6223, I
> get two differing strings due to different stereochemistry (last characters)
>
> ChEMBL
> InChI: 
> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
>*b12-10+*
> NAOMI InChI
> : 
> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
> *b12-10-*
>
> While researching why that happens I realized that the SDF string doesn't
> make use of the chirality bit that can be set in the counts line.
> When digging deeper I found the disabled block in the MolFileWriter.cpp ->
> MolToMolBlock function
>
> https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395
>
> Do I understand correctly that RDKit does not store any information about
> chirality in V2000 and includes chiral information only in V3000 SDF format?
>
> Does anyone know when ChEMBL might switch to that version?
>
> Kind regards,
> Emanuel
> ___
> Rdkit-devel mailing list
> Rdkit-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>
___
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel


Re: [Rdkit-devel] SDF String Generation Include Stereo information

2020-07-29 Thread Paolo Tosco
Hi Emanuel,

the RDKit perceives double bond stereochemistry on read, and encodes on
write, based on 2D coordinates in the molblock,

I put together an example gist here:

https://gist.github.com/ptosco/9ffae8814e84bcf189da7663775748e5

I hope that addresses your question , if not feel free to get back to me.

Cheers,
p.

On Wed, Jul 29, 2020 at 11:17 AM Emanuel Ehmki 
wrote:

> Dear All,
>
> I am currently working with the RDKit generated SDF String that is stored
> in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26.
> My workflow is:
>
>- pull SDF (V2000) from SQL table
>- generate internal molecule representation (NAOMI ChemBio tool-kit if
>that means anything to you)
>- generate InChI string and key from molecule
>- compare with InChI string and key that are stored in the ChEMBL
>database
>
> When comparing the InChI string for the molecule with the id CHEMBL6223, I
> get two differing strings due to different stereochemistry (last characters)
>
> ChEMBL
> InChI: 
> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
>*b12-10+*
> NAOMI InChI
> : 
> InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
> *b12-10-*
>
> While researching why that happens I realized that the SDF string doesn't
> make use of the chirality bit that can be set in the counts line.
> When digging deeper I found the disabled block in the MolFileWriter.cpp ->
> MolToMolBlock function
>
> https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395
>
> Do I understand correctly that RDKit does not store any information about
> chirality in V2000 and includes chiral information only in V3000 SDF format?
>
> Does anyone know when ChEMBL might switch to that version?
>
> Kind regards,
> Emanuel
> ___
> Rdkit-devel mailing list
> Rdkit-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-devel
>
___
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel


[Rdkit-devel] SDF String Generation Include Stereo information

2020-07-29 Thread Emanuel Ehmki
Dear All,

I am currently working with the RDKit generated SDF String that is stored
in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26.
My workflow is:

   - pull SDF (V2000) from SQL table
   - generate internal molecule representation (NAOMI ChemBio tool-kit if
   that means anything to you)
   - generate InChI string and key from molecule
   - compare with InChI string and key that are stored in the ChEMBL
   database

When comparing the InChI string for the molecule with the id CHEMBL6223, I
get two differing strings due to different stereochemistry (last characters)

ChEMBL
InChI: 
InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
   *b12-10+*
NAOMI InChI
: 
InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/
*b12-10-*

While researching why that happens I realized that the SDF string doesn't
make use of the chirality bit that can be set in the counts line.
When digging deeper I found the disabled block in the MolFileWriter.cpp ->
MolToMolBlock function
https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395

Do I understand correctly that RDKit does not store any information about
chirality in V2000 and includes chiral information only in V3000 SDF format?

Does anyone know when ChEMBL might switch to that version?

Kind regards,
Emanuel
___
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel