Re: [Rdkit-discuss] tautomers in rdkit

2017-04-18 Thread Peter S. Shenkin
"IIRC, [Roger] gives an example of a large chemical supplier who offered
two tautomers of the same compound for sale at very different prices which
is at least embarrassing."

On the other hand, it provides a great opportunity for arbitrage. ;-)

-P.

On Tue, Apr 18, 2017 at 4:02 AM, David Cosgrove 
wrote:

> Hi JW et al.,
> One of the last things I worked on before leaving AZ was what we called a
> tautomer-independent molecular representation. What we meant by this was a
> way of spotting whether a new compound being registered into the corporate
> collectin was a tautomer of one already in the database.  As part of that,
> I looked at the InChi representation and the tautomer handling which was at
> that point labelled experimental.  In our view, it was very limited in the
> types of tautomers it represented and not adequate to our needs.  As a
> result I developed a program called tt_tauts, which AZ "open-sourced" when
> they made me redundant, and is available at https://github.com/OpenEye-
> Contrib/TT_Tauts.  It's another plug for OEChem, I'm afraid, which seems
> poor form on the RDKit website, but there you go.  It is also a long way
> from being complete, and I am still working on it as a somewhat masochistic
> hobby.  Internally at CozChemIx Towers it is known as 'The Mole Project' in
> honour of the game 'Whac-A-Mole' (https://en.wikipedia.org/
> wiki/Whac-A-Mole) - every time you squash an odd tautomer case, another
> one pops up, quite often one you've already dealt with.  Chembl is a
> marvelous source of nasty test cases.  I hope to have a better version on
> github soon and also a description of the algorithm on my website.  It used
> as a jumping off point the work of Thalheim et al. (
> http://onlinelibrary.wiley.com/doi/10.1002/minf.201400128/full).
> Note that this use of tautomer enumeration/representation is somewhat
> different from that of quacpac or taut_enum. These last two are concerned
> with predicting tautomers likely to be present in water (well, blood,
> probably) at roughly neutral pH, the first is trying to deal with two
> chemists drawing the same compound in different tautomers which may look
> quite different, with the hydrogen atoms shifted a long way. Both are
> difficult and unsolved problems.  In one of Roger Sayle's papers on
> tautomers, IIRC, he gives an example of a large chemical supplier who
> offered two tautomers of the same compound for sale at very different
> prices which is at least embarrassing.
> Cheers,
> Dave
>
> On Tue, Apr 18, 2017 at 1:23 AM, JW Feng  wrote:
>
>> Hi Maria,
>>
>> From looking at Roger's slides on https://github.com/rdkit/UGM_2
>> 016/blob/master/Presentations/Sayle_RDKitTautomers.pdf.  Is he making an
>> argument that InChi values are insufficient in generating a canonical
>> string for different tautomers?  What if you perform a set of
>> standardization transformation prior to generating InChi values?  You may
>> want to look at how Genentech normalizes molecules for compound
>> registration. The code is based on OEChem and is open sourced on Github
>> https://github.com/chemalot/chemalot.  This package is actively being
>> developed and I am a contributor.  Specifically, you'll want to look at the
>> extensive standardization transformations in
>> https://github.com/chemalot/chemalot/blob/master/src/com/gen
>> entech/struchk/oeStruchk/Struchk.xml
>>
>> The last step in Struchk.xml is creating a canonical tautomer using
>> OpenEye's QuacPac toolkit.  QuacPac returns a canonical tautomer.  Could
>> one replace this step by converting a standardized molecule to InChi and
>> the back?  Another approach is using Dave Cosgrove's TautEnum package (
>> https://github.com/OpenEye-Contrib/TautEnum).  Both QuacPac and TautEnum
>> enumerates tautomers.  I believe that Roger is intimately familiar with
>> QuacPac
>>
>> Best,
>>
>> JW
>>
>> ___
>> JW Feng, Ph.D.
>> Denali Therapeutics Inc.
>> 151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
>> 270-0628
>>
>> On Tue, Apr 11, 2017 at 6:52 AM, > ourceforge.net> wrote:
>>
>>> Send Rdkit-discuss mailing list submissions to
>>> rdkit-discuss@lists.sourceforge.net
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>> or, via email, send a message with subject or body 'help' to
>>> rdkit-discuss-requ...@lists.sourceforge.net
>>>
>>> You can reach the person managing the list at
>>> rdkit-discuss-ow...@lists.sourceforge.net
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of Rdkit-discuss digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>1. tautomers in rdkit (MARIA BRANDL)
>>>2. Re: tautomers in rdkit (Peter S. Shenkin)
>>>3. official Tripos MOL2 file format PDF document (Francois BERENGER)
>>>
>>>
>>> 

Re: [Rdkit-discuss] tautomers in rdkit

2017-04-18 Thread David Cosgrove
Hi JW et al.,
One of the last things I worked on before leaving AZ was what we called a
tautomer-independent molecular representation. What we meant by this was a
way of spotting whether a new compound being registered into the corporate
collectin was a tautomer of one already in the database.  As part of that,
I looked at the InChi representation and the tautomer handling which was at
that point labelled experimental.  In our view, it was very limited in the
types of tautomers it represented and not adequate to our needs.  As a
result I developed a program called tt_tauts, which AZ "open-sourced" when
they made me redundant, and is available at
https://github.com/OpenEye-Contrib/TT_Tauts.  It's another plug for OEChem,
I'm afraid, which seems poor form on the RDKit website, but there you go.
It is also a long way from being complete, and I am still working on it as
a somewhat masochistic hobby.  Internally at CozChemIx Towers it is known
as 'The Mole Project' in honour of the game 'Whac-A-Mole' (
https://en.wikipedia.org/wiki/Whac-A-Mole) - every time you squash an odd
tautomer case, another one pops up, quite often one you've already dealt
with.  Chembl is a marvelous source of nasty test cases.  I hope to have a
better version on github soon and also a description of the algorithm on my
website.  It used as a jumping off point the work of Thalheim et al. (
http://onlinelibrary.wiley.com/doi/10.1002/minf.201400128/full).
Note that this use of tautomer enumeration/representation is somewhat
different from that of quacpac or taut_enum. These last two are concerned
with predicting tautomers likely to be present in water (well, blood,
probably) at roughly neutral pH, the first is trying to deal with two
chemists drawing the same compound in different tautomers which may look
quite different, with the hydrogen atoms shifted a long way. Both are
difficult and unsolved problems.  In one of Roger Sayle's papers on
tautomers, IIRC, he gives an example of a large chemical supplier who
offered two tautomers of the same compound for sale at very different
prices which is at least embarrassing.
Cheers,
Dave

On Tue, Apr 18, 2017 at 1:23 AM, JW Feng  wrote:

> Hi Maria,
>
> From looking at Roger's slides on https://github.com/rdkit/UGM_2
> 016/blob/master/Presentations/Sayle_RDKitTautomers.pdf.  Is he making an
> argument that InChi values are insufficient in generating a canonical
> string for different tautomers?  What if you perform a set of
> standardization transformation prior to generating InChi values?  You may
> want to look at how Genentech normalizes molecules for compound
> registration. The code is based on OEChem and is open sourced on Github
> https://github.com/chemalot/chemalot.  This package is actively being
> developed and I am a contributor.  Specifically, you'll want to look at the
> extensive standardization transformations in
> https://github.com/chemalot/chemalot/blob/master/src/com/gen
> entech/struchk/oeStruchk/Struchk.xml
>
> The last step in Struchk.xml is creating a canonical tautomer using
> OpenEye's QuacPac toolkit.  QuacPac returns a canonical tautomer.  Could
> one replace this step by converting a standardized molecule to InChi and
> the back?  Another approach is using Dave Cosgrove's TautEnum package (
> https://github.com/OpenEye-Contrib/TautEnum).  Both QuacPac and TautEnum
> enumerates tautomers.  I believe that Roger is intimately familiar with
> QuacPac
>
> Best,
>
> JW
>
> ___
> JW Feng, Ph.D.
> Denali Therapeutics Inc.
> 151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
> 270-0628
>
> On Tue, Apr 11, 2017 at 6:52 AM,  ourceforge.net> wrote:
>
>> Send Rdkit-discuss mailing list submissions to
>> rdkit-discuss@lists.sourceforge.net
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> or, via email, send a message with subject or body 'help' to
>> rdkit-discuss-requ...@lists.sourceforge.net
>>
>> You can reach the person managing the list at
>> rdkit-discuss-ow...@lists.sourceforge.net
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Rdkit-discuss digest..."
>>
>>
>> Today's Topics:
>>
>>1. tautomers in rdkit (MARIA BRANDL)
>>2. Re: tautomers in rdkit (Peter S. Shenkin)
>>3. official Tripos MOL2 file format PDF document (Francois BERENGER)
>>
>>
>> --
>>
>> Message: 1
>> Date: Tue, 11 Apr 2017 06:43:39 + (UTC)
>> From: MARIA BRANDL 
>> Subject: [Rdkit-discuss] tautomers in rdkit
>> To: "rdkit-discuss@lists.sourceforge.net"
>> 
>> Message-ID: <1522420730.263132.1491893019...@mail.yahoo.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Dear all,
>>
>> Is there going to be an attempt