Hi Gustavo,

you can pass InChI options to the underlying InChI API through the options
parameter of Chem.inchi.MolToInchi() and  Chem.inchi.MolToInchiKey(); e.g.:

inchi.MolToInchi(mol, options="/FixedH")

Source:
https://www.rdkit.org/docs/source/rdkit.Chem.inchi.html?highlight=inchi#rdkit.Chem.inchi.MolBlockToInchi

Cheers,
p.

On Thu, Oct 29, 2020 at 9:42 PM Gustavo Seabra <gustavo.sea...@gmail.com>
wrote:

> Ok, thanks!
> --
> Gustavo Seabra.
>
>
> On Thu, Oct 29, 2020 at 4:33 PM Igor Pletnev <igor.plet...@gmail.com>
> wrote:
>
>> >  Is this "/FixedH" an option in RDKit? How to use that? (I don't see
>> it in the docs).
>>
>> Sorry, I am not so proficient in RDKit and can not answer exactly. Anyway,
>> this option is available in InChI API calls, and I am pretty sure that it
>> is also available in RDKit.
>>
>> I recall that couple of years ago, on some InChI event,  Greg Landrum
>> somewhat surprised me by saying that he himself often uses non-Standard
>> InChI instead of Standard one — exactly to distinguish tautomers.
>> So I guess Greg can answer on how it is arranged in RDKit.
>>
>> Regards,
>> Igor
>>
>>
>>
>>
>>
>> On Thu, 29 Oct 2020 at 23:03, Gustavo Seabra <gustavo.sea...@gmail.com>
>> wrote:
>>
>>> That does make sense, I understand it now, thanks!
>>>
>>> Is this "/FixedH" an option in RDKit? How to use that? (I don't see it
>>> in the docs).
>>>
>>> Thanks,
>>> --
>>> Gustavo Seabra.
>>>
>>>
>>> On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev <igor.plet...@gmail.com>
>>> wrote:
>>>
>>>> Hi Gustavo,
>>>>
>>>> >  ... I was generating the InChI Keys to get a unique hash for each
>>>> compound, thinking it would be better than SMILES (guaranteed to be
>>>> unique), but is clearly not the case. On the bright side, I won't lose time
>>>> generating InChIs...
>>>>
>>>> though InChI is not perfect, in this case it behaves as intended.
>>>> Please see below.
>>>>
>>>> The discussed molecules contain substituted guanidine fragment
>>>> (RHN)C(=NMe)(NHR')
>>>>
>>>> It is subjected to tautomerism, and in different tautomers different
>>>> C-N bonds have double order:
>>>> (RHN)C(=NMe)(NHR')
>>>> (RHN)C(NHMe)(=NR')
>>>> (RN=)C(NHMe)(NHR')
>>>>
>>>> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix
>>>> in the examples.
>>>> Standard InChI is specifically designed to produce the same identifier
>>>> for all tautomers (by indicating that two hydrogens are shared by three
>>>> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI).
>>>>
>>>> As the tautomer-invariant Std InChI does not know which C-N bond is
>>>> actually a double, there is the only option for treating stereo -- to
>>>> completely ignore it as a drawing artifact.
>>>>
>>>> All in all:
>>>> Standard InChI means that the exact tautomeric form is unknown ==> all
>>>> tautomers are mapped to the same generic representation ==>  the exact C-N
>>>> double bond placement in this generic is unspecified ==> C-N double bond
>>>> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for
>>>> seemingly different, by initial drawing, cis/trans forms.
>>>>
>>>> Once again, this behavior is by design; it is intended for maximal
>>>> interoperability while comparing different drawings of the "same" compound.
>>>>
>>>> If, for any reason, you would like to consider your examples as the
>>>> definite and resolvable structures, each having its own identifier, just
>>>> use non-Standard InChI.
>>>> The InChI which preserves the exact positions of tautomeric H's and
>>>> double bond ("as drawn") is produced by just specifying option /FixedH upon
>>>> generation.
>>>>
>>>> More on this may be found in InChI FAQ:
>>>> https://www.inchi-trust.org/technical-faq-2/
>>>>
>>>> Hope this helps.
>>>>
>>>> Regards,
>>>> Igor
>>>>
>>>>
>>>>
>>>> On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra <
>>>> gustavo.sea...@gmail.com> wrote:
>>>>
>>>>> Thanks a lot Peter and Adelene,
>>>>>
>>>>> Yes, it looks like canonical SMILES is the way to go, and I have no
>>>>> problem sticking with RDKit. I was generating the InChI Keys to get a
>>>>> unique hash for each compound, thinking it would be better than SMILES
>>>>> (guaranteed to be unique), but is clearly not the case. On the bright 
>>>>> side,
>>>>> I won't lose time generating InChIs...
>>>>>
>>>>> Can I trust that the same molecule will always get the same canonical
>>>>> SMILES from RDKit, independent of how it is read? (Different SDF files,
>>>>> geometries, atom orders, etc.?)
>>>>>
>>>>> All the best,
>>>>> Gustavo.
>>>>>
>>>>>
>>>>> --
>>>>> Gustavo Seabra.
>>>>>
>>>>>
>>>>> On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin <shen...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Canonical SMILES is probably the way to go, but you might also be
>>>>>> able to use the InchiKey and the Inchi auxiliary information together as 
>>>>>> a
>>>>>> compound hash key.
>>>>>>
>>>>>> -P.
>>>>>>
>>>>>> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI <adelene....@uni.lu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Gustavo,
>>>>>>>
>>>>>>>
>>>>>>> (Sorry, forgot to reply all before...)
>>>>>>>
>>>>>>>
>>>>>>> Your deduplication task is quite familiar to me and something I do
>>>>>>> quite a lot of in my own work ;)
>>>>>>>
>>>>>>>
>>>>>>> Can I suggest deduplicating using Canonical SMILES?
>>>>>>>
>>>>>>>
>>>>>>> It doesn't solve your InChIKey issue, but it is a solution for now.
>>>>>>>
>>>>>>>
>>>>>>> I updated my gist to show that it is feasible:
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>>>>
>>>>>>>
>>>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>>>>>
>>>>>>> Adelene
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Doctoral Researcher
>>>>>>>
>>>>>>> Environmental Cheminformatics
>>>>>>>
>>>>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>>>>
>>>>>>>
>>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>>>>
>>>>>>> 6, avenue du Swing
>>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
>>>>>>> L-4367 Belvaux
>>>>>>>
>>>>>>> T +356 46 66 44 67 18
>>>>>>>
>>>>>>> [image: github.png] adelenelai
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>>>>> *Sent:* Sunday, October 25, 2020 2:27:15 PM
>>>>>>> *To:* Adelene LAI
>>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>>> InChI Key
>>>>>>>
>>>>>>> Actually,  I was trying to generate all stereoisomers for molecules
>>>>>>> in a database,  and filter duplicate molecules by using the InChI Key to
>>>>>>> detect duplicates.  But it gives cis/trans isomers on sp2-N the same 
>>>>>>> Key.
>>>>>>>
>>>>>>> Gustavo.
>>>>>>>
>>>>>>> --
>>>>>>> Gustavo Seabra
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Adelene LAI <adelene....@uni.lu>
>>>>>>> *Sent:* Sunday, October 25, 2020 1:44:01 AM
>>>>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>>> InChI Key
>>>>>>>
>>>>>>>
>>>>>>> Hi Gustavo,
>>>>>>>
>>>>>>>
>>>>>>> It occurred to me while swimming yesterday - was there a reason you
>>>>>>> pointed out the hybridisation state of N in your original subject text?
>>>>>>>
>>>>>>>
>>>>>>> Was it just to specify which N to focus on, or did you expect
>>>>>>> something special about sp2 hybridisation wrt InChIKey?
>>>>>>>
>>>>>>>
>>>>>>> Adelene
>>>>>>>
>>>>>>>
>>>>>>> Doctoral Researcher
>>>>>>>
>>>>>>> Environmental Cheminformatics
>>>>>>>
>>>>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>>>>
>>>>>>>
>>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>>>>
>>>>>>> 6, avenue du Swing
>>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
>>>>>>> L-4367 Belvaux
>>>>>>>
>>>>>>> T +356 46 66 44 67 18
>>>>>>>
>>>>>>> [image: github.png] adelenelai
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>>>>> *Sent:* Saturday, October 24, 2020 5:37:09 AM
>>>>>>> *To:* RDKit Discuss; Adelene LAI
>>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>>> InChI Key
>>>>>>>
>>>>>>> Thanks for looking into it. I'm happy to see.it wasn't just a
>>>>>>> mistake by me ;-)
>>>>>>>
>>>>>>> I hope we can find what's wrong there.
>>>>>>>
>>>>>>> Best,
>>>>>>> Gustavo.
>>>>>>>
>>>>>>> --
>>>>>>> Gustavo Seabra
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Adelene LAI <adelene....@uni.lu>
>>>>>>> *Sent:* Friday, October 23, 2020 11:28:55 PM
>>>>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>; RDKit Discuss <
>>>>>>> rdkit-discuss@lists.sourceforge.net>
>>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same
>>>>>>> InChI Key
>>>>>>>
>>>>>>>
>>>>>>> Hi Gustavo,
>>>>>>>
>>>>>>>
>>>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>
>>>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>>>>>>>
>>>>>>>
>>>>>>> In the gist above, I tried doing some further investigating.
>>>>>>>
>>>>>>>
>>>>>>> It seems for the example you gave, the rdkit functions indeed give
>>>>>>> the same inchikey and inchi, but different aux info.
>>>>>>>
>>>>>>>
>>>>>>> Why this different aux info doesn't translate into different
>>>>>>> inchikeys/inchis, I'm not sure.
>>>>>>>
>>>>>>>
>>>>>>> Adelene
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Doctoral Researcher
>>>>>>>
>>>>>>> Environmental Cheminformatics
>>>>>>>
>>>>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>>>>
>>>>>>>
>>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>>>>
>>>>>>> 6, avenue du Swing
>>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>,
>>>>>>> L-4367 Belvaux
>>>>>>>
>>>>>>> T +356 46 66 44 67 18
>>>>>>>
>>>>>>> [image: github.png] adelenelai
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com>
>>>>>>> *Sent:* Friday, October 23, 2020 6:43:07 PM
>>>>>>> *To:* RDKit Discuss
>>>>>>> *Subject:* [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI
>>>>>>> Key
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I run into an issue here, and I'd appreciate your input. I noticed
>>>>>>> that compounds that differ only on the cis-trans isomerization around an
>>>>>>> sp2 nitrogen get the same InChI Key from RDKit. For example:
>>>>>>>
>>>>>>> > inchi_cis =
>>>>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
>>>>>>> > inchi_cis
>>>>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>>>>>>>
>>>>>>> > inchi_trans =
>>>>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
>>>>>>> > inchi_trans
>>>>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>>>>>>>
>>>>>>> > inchi_cis == inchi_trans
>>>>>>> True
>>>>>>>
>>>>>>> I wonder if this is a limitation of the InChI Key definition, or an
>>>>>>> implementation issue.
>>>>>>>
>>>>>>> Thanks a lot,
>>>>>>>
>>>>>> --
>>>>>>> Gustavo Seabra.
>>>>>>> _______________________________________________
>>>>>>> Rdkit-discuss mailing list
>>>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>>>
>>>>>> _______________________________________________
>>>>> Rdkit-discuss mailing list
>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>
>>>> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to