Hi Gustavo, you can pass InChI options to the underlying InChI API through the options parameter of Chem.inchi.MolToInchi() and Chem.inchi.MolToInchiKey(); e.g.:
inchi.MolToInchi(mol, options="/FixedH") Source: https://www.rdkit.org/docs/source/rdkit.Chem.inchi.html?highlight=inchi#rdkit.Chem.inchi.MolBlockToInchi Cheers, p. On Thu, Oct 29, 2020 at 9:42 PM Gustavo Seabra <gustavo.sea...@gmail.com> wrote: > Ok, thanks! > -- > Gustavo Seabra. > > > On Thu, Oct 29, 2020 at 4:33 PM Igor Pletnev <igor.plet...@gmail.com> > wrote: > >> > Is this "/FixedH" an option in RDKit? How to use that? (I don't see >> it in the docs). >> >> Sorry, I am not so proficient in RDKit and can not answer exactly. Anyway, >> this option is available in InChI API calls, and I am pretty sure that it >> is also available in RDKit. >> >> I recall that couple of years ago, on some InChI event, Greg Landrum >> somewhat surprised me by saying that he himself often uses non-Standard >> InChI instead of Standard one — exactly to distinguish tautomers. >> So I guess Greg can answer on how it is arranged in RDKit. >> >> Regards, >> Igor >> >> >> >> >> >> On Thu, 29 Oct 2020 at 23:03, Gustavo Seabra <gustavo.sea...@gmail.com> >> wrote: >> >>> That does make sense, I understand it now, thanks! >>> >>> Is this "/FixedH" an option in RDKit? How to use that? (I don't see it >>> in the docs). >>> >>> Thanks, >>> -- >>> Gustavo Seabra. >>> >>> >>> On Wed, Oct 28, 2020 at 6:10 PM Igor Pletnev <igor.plet...@gmail.com> >>> wrote: >>> >>>> Hi Gustavo, >>>> >>>> > ... I was generating the InChI Keys to get a unique hash for each >>>> compound, thinking it would be better than SMILES (guaranteed to be >>>> unique), but is clearly not the case. On the bright side, I won't lose time >>>> generating InChIs... >>>> >>>> though InChI is not perfect, in this case it behaves as intended. >>>> Please see below. >>>> >>>> The discussed molecules contain substituted guanidine fragment >>>> (RHN)C(=NMe)(NHR') >>>> >>>> It is subjected to tautomerism, and in different tautomers different >>>> C-N bonds have double order: >>>> (RHN)C(=NMe)(NHR') >>>> (RHN)C(NHMe)(=NR') >>>> (RN=)C(NHMe)(NHR') >>>> >>>> You generated Standard InChI, which is evidenced by "InChI=1S/" prefix >>>> in the examples. >>>> Standard InChI is specifically designed to produce the same identifier >>>> for all tautomers (by indicating that two hydrogens are shared by three >>>> nitrogen atoms, for any tautomer; bond orders are not indicated in InChI). >>>> >>>> As the tautomer-invariant Std InChI does not know which C-N bond is >>>> actually a double, there is the only option for treating stereo -- to >>>> completely ignore it as a drawing artifact. >>>> >>>> All in all: >>>> Standard InChI means that the exact tautomeric form is unknown ==> all >>>> tautomers are mapped to the same generic representation ==> the exact C-N >>>> double bond placement in this generic is unspecified ==> C-N double bond >>>> stereo is ignored ==> generated StdInChI and Std InChIKey are the same for >>>> seemingly different, by initial drawing, cis/trans forms. >>>> >>>> Once again, this behavior is by design; it is intended for maximal >>>> interoperability while comparing different drawings of the "same" compound. >>>> >>>> If, for any reason, you would like to consider your examples as the >>>> definite and resolvable structures, each having its own identifier, just >>>> use non-Standard InChI. >>>> The InChI which preserves the exact positions of tautomeric H's and >>>> double bond ("as drawn") is produced by just specifying option /FixedH upon >>>> generation. >>>> >>>> More on this may be found in InChI FAQ: >>>> https://www.inchi-trust.org/technical-faq-2/ >>>> >>>> Hope this helps. >>>> >>>> Regards, >>>> Igor >>>> >>>> >>>> >>>> On Mon, Oct 26, 2020 at 6:56 PM Gustavo Seabra < >>>> gustavo.sea...@gmail.com> wrote: >>>> >>>>> Thanks a lot Peter and Adelene, >>>>> >>>>> Yes, it looks like canonical SMILES is the way to go, and I have no >>>>> problem sticking with RDKit. I was generating the InChI Keys to get a >>>>> unique hash for each compound, thinking it would be better than SMILES >>>>> (guaranteed to be unique), but is clearly not the case. On the bright >>>>> side, >>>>> I won't lose time generating InChIs... >>>>> >>>>> Can I trust that the same molecule will always get the same canonical >>>>> SMILES from RDKit, independent of how it is read? (Different SDF files, >>>>> geometries, atom orders, etc.?) >>>>> >>>>> All the best, >>>>> Gustavo. >>>>> >>>>> >>>>> -- >>>>> Gustavo Seabra. >>>>> >>>>> >>>>> On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin <shen...@gmail.com> >>>>> wrote: >>>>> >>>>>> Canonical SMILES is probably the way to go, but you might also be >>>>>> able to use the InchiKey and the Inchi auxiliary information together as >>>>>> a >>>>>> compound hash key. >>>>>> >>>>>> -P. >>>>>> >>>>>> On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI <adelene....@uni.lu> >>>>>> wrote: >>>>>> >>>>>>> Hi Gustavo, >>>>>>> >>>>>>> >>>>>>> (Sorry, forgot to reply all before...) >>>>>>> >>>>>>> >>>>>>> Your deduplication task is quite familiar to me and something I do >>>>>>> quite a lot of in my own work ;) >>>>>>> >>>>>>> >>>>>>> Can I suggest deduplicating using Canonical SMILES? >>>>>>> >>>>>>> >>>>>>> It doesn't solve your InChIKey issue, but it is a solution for now. >>>>>>> >>>>>>> >>>>>>> I updated my gist to show that it is feasible: >>>>>>> >>>>>>> >>>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f >>>>>>> >>>>>>> >>>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f> >>>>>>> >>>>>>> Adelene >>>>>>> >>>>>>> >>>>>>> >>>>>>> Doctoral Researcher >>>>>>> >>>>>>> Environmental Cheminformatics >>>>>>> >>>>>>> UNIVERSITÉ DU LUXEMBOURG >>>>>>> >>>>>>> >>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE >>>>>>> >>>>>>> 6, avenue du Swing >>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>, >>>>>>> L-4367 Belvaux >>>>>>> >>>>>>> T +356 46 66 44 67 18 >>>>>>> >>>>>>> [image: github.png] adelenelai >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com> >>>>>>> *Sent:* Sunday, October 25, 2020 2:27:15 PM >>>>>>> *To:* Adelene LAI >>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same >>>>>>> InChI Key >>>>>>> >>>>>>> Actually, I was trying to generate all stereoisomers for molecules >>>>>>> in a database, and filter duplicate molecules by using the InChI Key to >>>>>>> detect duplicates. But it gives cis/trans isomers on sp2-N the same >>>>>>> Key. >>>>>>> >>>>>>> Gustavo. >>>>>>> >>>>>>> -- >>>>>>> Gustavo Seabra >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Adelene LAI <adelene....@uni.lu> >>>>>>> *Sent:* Sunday, October 25, 2020 1:44:01 AM >>>>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com> >>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same >>>>>>> InChI Key >>>>>>> >>>>>>> >>>>>>> Hi Gustavo, >>>>>>> >>>>>>> >>>>>>> It occurred to me while swimming yesterday - was there a reason you >>>>>>> pointed out the hybridisation state of N in your original subject text? >>>>>>> >>>>>>> >>>>>>> Was it just to specify which N to focus on, or did you expect >>>>>>> something special about sp2 hybridisation wrt InChIKey? >>>>>>> >>>>>>> >>>>>>> Adelene >>>>>>> >>>>>>> >>>>>>> Doctoral Researcher >>>>>>> >>>>>>> Environmental Cheminformatics >>>>>>> >>>>>>> UNIVERSITÉ DU LUXEMBOURG >>>>>>> >>>>>>> >>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE >>>>>>> >>>>>>> 6, avenue du Swing >>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>, >>>>>>> L-4367 Belvaux >>>>>>> >>>>>>> T +356 46 66 44 67 18 >>>>>>> >>>>>>> [image: github.png] adelenelai >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com> >>>>>>> *Sent:* Saturday, October 24, 2020 5:37:09 AM >>>>>>> *To:* RDKit Discuss; Adelene LAI >>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same >>>>>>> InChI Key >>>>>>> >>>>>>> Thanks for looking into it. I'm happy to see.it wasn't just a >>>>>>> mistake by me ;-) >>>>>>> >>>>>>> I hope we can find what's wrong there. >>>>>>> >>>>>>> Best, >>>>>>> Gustavo. >>>>>>> >>>>>>> -- >>>>>>> Gustavo Seabra >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Adelene LAI <adelene....@uni.lu> >>>>>>> *Sent:* Friday, October 23, 2020 11:28:55 PM >>>>>>> *To:* Gustavo Seabra <gustavo.sea...@gmail.com>; RDKit Discuss < >>>>>>> rdkit-discuss@lists.sourceforge.net> >>>>>>> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same >>>>>>> InChI Key >>>>>>> >>>>>>> >>>>>>> Hi Gustavo, >>>>>>> >>>>>>> >>>>>>> <https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f> >>>>>>> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f >>>>>>> >>>>>>> >>>>>>> In the gist above, I tried doing some further investigating. >>>>>>> >>>>>>> >>>>>>> It seems for the example you gave, the rdkit functions indeed give >>>>>>> the same inchikey and inchi, but different aux info. >>>>>>> >>>>>>> >>>>>>> Why this different aux info doesn't translate into different >>>>>>> inchikeys/inchis, I'm not sure. >>>>>>> >>>>>>> >>>>>>> Adelene >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Doctoral Researcher >>>>>>> >>>>>>> Environmental Cheminformatics >>>>>>> >>>>>>> UNIVERSITÉ DU LUXEMBOURG >>>>>>> >>>>>>> >>>>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE >>>>>>> >>>>>>> 6, avenue du Swing >>>>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail&source=g>, >>>>>>> L-4367 Belvaux >>>>>>> >>>>>>> T +356 46 66 44 67 18 >>>>>>> >>>>>>> [image: github.png] adelenelai >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Gustavo Seabra <gustavo.sea...@gmail.com> >>>>>>> *Sent:* Friday, October 23, 2020 6:43:07 PM >>>>>>> *To:* RDKit Discuss >>>>>>> *Subject:* [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI >>>>>>> Key >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I run into an issue here, and I'd appreciate your input. I noticed >>>>>>> that compounds that differ only on the cis-trans isomerization around an >>>>>>> sp2 nitrogen get the same InChI Key from RDKit. For example: >>>>>>> >>>>>>> > inchi_cis = >>>>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C")) >>>>>>> > inchi_cis >>>>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N' >>>>>>> >>>>>>> > inchi_trans = >>>>>>> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C")) >>>>>>> > inchi_trans >>>>>>> 'AQIXAKUUQRKLND-UHFFFAOYSA-N' >>>>>>> >>>>>>> > inchi_cis == inchi_trans >>>>>>> True >>>>>>> >>>>>>> I wonder if this is a limitation of the InChI Key definition, or an >>>>>>> implementation issue. >>>>>>> >>>>>>> Thanks a lot, >>>>>>> >>>>>> -- >>>>>>> Gustavo Seabra. >>>>>>> _______________________________________________ >>>>>>> Rdkit-discuss mailing list >>>>>>> Rdkit-discuss@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>>>>> >>>>>> _______________________________________________ >>>>> Rdkit-discuss mailing list >>>>> Rdkit-discuss@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>>> >>>> _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss