"IIRC, [Roger] gives an example of a large chemical supplier who offered
two tautomers of the same compound for sale at very different prices which
is at least embarrassing."

On the other hand, it provides a great opportunity for arbitrage. ;-)

-P.

On Tue, Apr 18, 2017 at 4:02 AM, David Cosgrove <davidacosgrov...@gmail.com>
wrote:

> Hi JW et al.,
> One of the last things I worked on before leaving AZ was what we called a
> tautomer-independent molecular representation. What we meant by this was a
> way of spotting whether a new compound being registered into the corporate
> collectin was a tautomer of one already in the database.  As part of that,
> I looked at the InChi representation and the tautomer handling which was at
> that point labelled experimental.  In our view, it was very limited in the
> types of tautomers it represented and not adequate to our needs.  As a
> result I developed a program called tt_tauts, which AZ "open-sourced" when
> they made me redundant, and is available at https://github.com/OpenEye-
> Contrib/TT_Tauts.  It's another plug for OEChem, I'm afraid, which seems
> poor form on the RDKit website, but there you go.  It is also a long way
> from being complete, and I am still working on it as a somewhat masochistic
> hobby.  Internally at CozChemIx Towers it is known as 'The Mole Project' in
> honour of the game 'Whac-A-Mole' (https://en.wikipedia.org/
> wiki/Whac-A-Mole) - every time you squash an odd tautomer case, another
> one pops up, quite often one you've already dealt with.  Chembl is a
> marvelous source of nasty test cases.  I hope to have a better version on
> github soon and also a description of the algorithm on my website.  It used
> as a jumping off point the work of Thalheim et al. (
> http://onlinelibrary.wiley.com/doi/10.1002/minf.201400128/full).
> Note that this use of tautomer enumeration/representation is somewhat
> different from that of quacpac or taut_enum. These last two are concerned
> with predicting tautomers likely to be present in water (well, blood,
> probably) at roughly neutral pH, the first is trying to deal with two
> chemists drawing the same compound in different tautomers which may look
> quite different, with the hydrogen atoms shifted a long way. Both are
> difficult and unsolved problems.  In one of Roger Sayle's papers on
> tautomers, IIRC, he gives an example of a large chemical supplier who
> offered two tautomers of the same compound for sale at very different
> prices which is at least embarrassing.
> Cheers,
> Dave
>
> On Tue, Apr 18, 2017 at 1:23 AM, JW Feng <f...@dnli.com> wrote:
>
>> Hi Maria,
>>
>> From looking at Roger's slides on https://github.com/rdkit/UGM_2
>> 016/blob/master/Presentations/Sayle_RDKitTautomers.pdf.  Is he making an
>> argument that InChi values are insufficient in generating a canonical
>> string for different tautomers?  What if you perform a set of
>> standardization transformation prior to generating InChi values?  You may
>> want to look at how Genentech normalizes molecules for compound
>> registration. The code is based on OEChem and is open sourced on Github
>> https://github.com/chemalot/chemalot.  This package is actively being
>> developed and I am a contributor.  Specifically, you'll want to look at the
>> extensive standardization transformations in
>> https://github.com/chemalot/chemalot/blob/master/src/com/gen
>> entech/struchk/oeStruchk/Struchk.xml
>>
>> The last step in Struchk.xml is creating a canonical tautomer using
>> OpenEye's QuacPac toolkit.  QuacPac returns a canonical tautomer.  Could
>> one replace this step by converting a standardized molecule to InChi and
>> the back?  Another approach is using Dave Cosgrove's TautEnum package (
>> https://github.com/OpenEye-Contrib/TautEnum).  Both QuacPac and TautEnum
>> enumerates tautomers.  I believe that Roger is intimately familiar with
>> QuacPac
>>
>> Best,
>>
>> JW
>>
>> ___________________
>> JW Feng, Ph.D.
>> Denali Therapeutics Inc.
>> 151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
>> 270-0628
>>
>> On Tue, Apr 11, 2017 at 6:52 AM, <rdkit-discuss-request@lists.s
>> ourceforge.net> wrote:
>>
>>> Send Rdkit-discuss mailing list submissions to
>>>         rdkit-discuss@lists.sourceforge.net
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>         https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>> or, via email, send a message with subject or body 'help' to
>>>         rdkit-discuss-requ...@lists.sourceforge.net
>>>
>>> You can reach the person managing the list at
>>>         rdkit-discuss-ow...@lists.sourceforge.net
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of Rdkit-discuss digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>    1. tautomers in rdkit (MARIA BRANDL)
>>>    2. Re: tautomers in rdkit (Peter S. Shenkin)
>>>    3. official Tripos MOL2 file format PDF document (Francois BERENGER)
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Tue, 11 Apr 2017 06:43:39 +0000 (UTC)
>>> From: MARIA BRANDL <m.bra...@btinternet.com>
>>> Subject: [Rdkit-discuss] tautomers in rdkit
>>> To: "rdkit-discuss@lists.sourceforge.net"
>>>         <rdkit-discuss@lists.sourceforge.net>
>>> Message-ID: <1522420730.263132.1491893019...@mail.yahoo.com>
>>> Content-Type: text/plain; charset="utf-8"
>>>
>>> Dear all,
>>>
>>> Is there going to be an attempt at coding Roger Sayle's ?"Alternative
>>> Approach" to tautomers described inRDKit: Six Not-So-Easy Pieces [RDKit UGM
>>> 2016]?into RDKit ?
>>>
>>>
>>> I have managed to get reasonable tautomers out of Resonance.cpp using:
>>> suppl = 
>>> rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
>>> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
>>> rdchem.ResonanceFlags.ALLOW_INCOMPLETE_OCTETS
>>> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
>>> rdchem.ResonanceFlags.UNCONSTRAINED_CATIONS
>>> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rdchem.ResonanceFlags.UNCONSTR
>>> AINED_ANIONS)
>>> ?with some post-filtering for e.g. carbocations, but feel that it may be
>>> more efficient to put user defined constraints on each atom during the
>>> backtracking loops, as Roger suggests.
>>> Looking forward to hearing your thoughts on this.
>>> Best regards,
>>> Maria Brandl
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Tue, 11 Apr 2017 03:47:47 -0400
>>> From: "Peter S. Shenkin" <shen...@gmail.com>
>>> Subject: Re: [Rdkit-discuss] tautomers in rdkit
>>> To: MARIA BRANDL <m.bra...@btinternet.com>
>>> Cc: "rdkit-discuss@lists.sourceforge.net"
>>>         <rdkit-discuss@lists.sourceforge.net>
>>> Message-ID:
>>>         <CAAsqebH6gVRpm2rhhzv0-koWVr6P0WU+QK0EO2=x4ctvhgx...@mail.gm
>>> ail.com>
>>> Content-Type: text/plain; charset="utf-8"
>>>
>>> Just from the slides, it's not clear that Roger had a solution; the
>>> slides
>>> seem to just suggest an approach. Am I missing something here?
>>>
>>> That is, he defined the invariants that all tautomers of a compound have
>>> to
>>> share and expressed it as a SMARTS + constraints; but I didn't see that
>>> he
>>> provided a methodology to derive a canonical matching SMILES from a
>>> SMARTS
>>> + constraints. True, if two structures match the SMARTS + constraints,
>>> they
>>> are likely tautomers. (I can't think of why they wouldn't be, but maybe
>>> it's not always the case.) So that part provides deduplication of an
>>> input
>>> stream, which is good, but no way to derive and store a canonical
>>> representation.
>>>
>>> Again, perhaps I'm missing something, but if so, what?
>>>
>>> -P.
>>>
>>> On Tue, Apr 11, 2017 at 2:43 AM, MARIA BRANDL <m.bra...@btinternet.com>
>>> wrote:
>>>
>>> > Dear all,
>>> >
>>> >
>>> > Is there going to be an attempt at coding Roger Sayle's  "Alternative
>>> > Approach" to tautomers described in
>>> > RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
>>> > <https://de.slideshare.net/NextMoveSoftware/rdkit-six-notsoe
>>> asy-pieces-rdkit-ugm-2016> into
>>> > RDKit ?
>>> >
>>> >
>>> > I have managed to get reasonable tautomers out of Resonance.cpp using:
>>> >
>>> > suppl = rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CH
>>> ARGE_SEPARATION
>>> > | \
>>> >                                           rdchem.ResonanceFlags.ALLOW_I
>>> NCOMPLETE_OCTETS
>>> > | \
>>> >                                           rdchem.ResonanceFlags.UNCONST
>>> RAINED_CATIONS
>>> > | \
>>> >                                           rdchem.ResonanceFlags.
>>> > UNCONSTRAINED_ANIONS)
>>> >
>>> >  with some post-filtering for e.g. carbocations, but feel that it may
>>> be
>>> > more efficient to put user defined constraints on each atom during the
>>> > backtracking loops, as Roger suggests.
>>> >
>>> > Looking forward to hearing your thoughts on this.
>>> >
>>> > Best regards,
>>> >
>>> > Maria Brandl
>>> >
>>> > ------------------------------------------------------------
>>> > ------------------
>>> > Check out the vibrant tech community on one of the world's most
>>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> > _______________________________________________
>>> > Rdkit-discuss mailing list
>>> > Rdkit-discuss@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>> >
>>> >
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>>
>>> ------------------------------
>>>
>>> Message: 3
>>> Date: Tue, 11 Apr 2017 08:35:53 -0500
>>> From: Francois BERENGER <francois.c.beren...@vanderbilt.edu>
>>> Subject: [Rdkit-discuss] official Tripos MOL2 file format PDF document
>>> To: "rdkit-discuss@lists.sourceforge.net"
>>>         <rdkit-discuss@lists.sourceforge.net>
>>> Message-ID: <1f673e0d-0c10-a325-dde7-c28e76e06...@vanderbilt.edu>
>>> Content-Type: text/plain; charset="utf-8"; format=flowed
>>>
>>> Hello,
>>>
>>> Not directly related to rdkit, but if someone that have
>>> the original PDF of this file format could place it
>>> online permanently, that would be wonderful.
>>>
>>> The official URL at tripos.com is dead since quite some time
>>> apparently.
>>> And that's bad because it's a quite popular file format
>>> and its specification should be permanently archived.
>>>
>>> Thanks a lot,
>>> Francois.
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>> End of Rdkit-discuss Digest, Vol 114, Issue 8
>>> *********************************************
>>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to