Re: [Rdkit-discuss] tautomers in rdkit

2017-04-18 Thread Peter S. Shenkin
>>> You can reach the person managing the list at
>>> rdkit-discuss-ow...@lists.sourceforge.net
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of Rdkit-discuss digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>1. tautomers in rdkit (MARIA BRANDL)
>>>2. Re: tautomers in rdkit (Peter S. Shenkin)
>>>3. official Tripos MOL2 file format PDF document (Francois BERENGER)
>>>
>>>
>>> --
>>>
>>> Message: 1
>>> Date: Tue, 11 Apr 2017 06:43:39 + (UTC)
>>> From: MARIA BRANDL <m.bra...@btinternet.com>
>>> Subject: [Rdkit-discuss] tautomers in rdkit
>>> To: "rdkit-discuss@lists.sourceforge.net"
>>> <rdkit-discuss@lists.sourceforge.net>
>>> Message-ID: <1522420730.263132.1491893019...@mail.yahoo.com>
>>> Content-Type: text/plain; charset="utf-8"
>>>
>>> Dear all,
>>>
>>> Is there going to be an attempt at coding Roger Sayle's ?"Alternative
>>> Approach" to tautomers described inRDKit: Six Not-So-Easy Pieces [RDKit UGM
>>> 2016]?into RDKit ?
>>>
>>>
>>> I have managed to get reasonable tautomers out of Resonance.cpp using:
>>> suppl = 
>>> rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
>>> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
>>> rdchem.ResonanceFlags.ALLOW_INCOMPLETE_OCTETS
>>> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
>>> rdchem.ResonanceFlags.UNCONSTRAINED_CATIONS
>>> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rdchem.ResonanceFlags.UNCONSTR
>>> AINED_ANIONS)
>>> ?with some post-filtering for e.g. carbocations, but feel that it may be
>>> more efficient to put user defined constraints on each atom during the
>>> backtracking loops, as Roger suggests.
>>> Looking forward to hearing your thoughts on this.
>>> Best regards,
>>> Maria Brandl
>>> -- next part --
>>> An HTML attachment was scrubbed...
>>>
>>> --
>>>
>>> Message: 2
>>> Date: Tue, 11 Apr 2017 03:47:47 -0400
>>> From: "Peter S. Shenkin" <shen...@gmail.com>
>>> Subject: Re: [Rdkit-discuss] tautomers in rdkit
>>> To: MARIA BRANDL <m.bra...@btinternet.com>
>>> Cc: "rdkit-discuss@lists.sourceforge.net"
>>> <rdkit-discuss@lists.sourceforge.net>
>>> Message-ID:
>>> <CAAsqebH6gVRpm2rhhzv0-koWVr6P0WU+QK0EO2=x4ctvhgx...@mail.gm
>>> ail.com>
>>> Content-Type: text/plain; charset="utf-8"
>>>
>>> Just from the slides, it's not clear that Roger had a solution; the
>>> slides
>>> seem to just suggest an approach. Am I missing something here?
>>>
>>> That is, he defined the invariants that all tautomers of a compound have
>>> to
>>> share and expressed it as a SMARTS + constraints; but I didn't see that
>>> he
>>> provided a methodology to derive a canonical matching SMILES from a
>>> SMARTS
>>> + constraints. True, if two structures match the SMARTS + constraints,
>>> they
>>> are likely tautomers. (I can't think of why they wouldn't be, but maybe
>>> it's not always the case.) So that part provides deduplication of an
>>> input
>>> stream, which is good, but no way to derive and store a canonical
>>> representation.
>>>
>>> Again, perhaps I'm missing something, but if so, what?
>>>
>>> -P.
>>>
>>> On Tue, Apr 11, 2017 at 2:43 AM, MARIA BRANDL <m.bra...@btinternet.com>
>>> wrote:
>>>
>>> > Dear all,
>>> >
>>> >
>>> > Is there going to be an attempt at coding Roger Sayle's  "Alternative
>>> > Approach" to tautomers described in
>>> > RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
>>> > <https://de.slideshare.net/NextMoveSoftware/rdkit-six-notsoe
>>> asy-pieces-rdkit-ugm-2016> into
>>> > RDKit ?
>>> >
>>> >
>>> > I have managed to get reasonable tautomers out of Resonance.cpp using:
>>> >
>>> > suppl = rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CH
>>> ARGE_SEPARATION
>>> > | \
>>> >

Re: [Rdkit-discuss] tautomers in rdkit

2017-04-18 Thread David Cosgrove
>
>> Subject: [Rdkit-discuss] tautomers in rdkit
>> To: "rdkit-discuss@lists.sourceforge.net"
>> <rdkit-discuss@lists.sourceforge.net>
>> Message-ID: <1522420730.263132.1491893019...@mail.yahoo.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Dear all,
>>
>> Is there going to be an attempt at coding Roger Sayle's ?"Alternative
>> Approach" to tautomers described inRDKit: Six Not-So-Easy Pieces [RDKit UGM
>> 2016]?into RDKit ?
>>
>>
>> I have managed to get reasonable tautomers out of Resonance.cpp using:
>> suppl = 
>> rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
>> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
>> rdchem.ResonanceFlags.ALLOW_INCOMPLETE_OCTETS
>> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
>> rdchem.ResonanceFlags.UNCONSTRAINED_CATIONS
>> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rdchem.ResonanceFlags.UNCONSTR
>> AINED_ANIONS)
>> ?with some post-filtering for e.g. carbocations, but feel that it may be
>> more efficient to put user defined constraints on each atom during the
>> backtracking loops, as Roger suggests.
>> Looking forward to hearing your thoughts on this.
>> Best regards,
>> Maria Brandl
>> -- next part --
>> An HTML attachment was scrubbed...
>>
>> --
>>
>> Message: 2
>> Date: Tue, 11 Apr 2017 03:47:47 -0400
>> From: "Peter S. Shenkin" <shen...@gmail.com>
>> Subject: Re: [Rdkit-discuss] tautomers in rdkit
>> To: MARIA BRANDL <m.bra...@btinternet.com>
>> Cc: "rdkit-discuss@lists.sourceforge.net"
>> <rdkit-discuss@lists.sourceforge.net>
>> Message-ID:
>> <CAAsqebH6gVRpm2rhhzv0-koWVr6P0WU+QK0EO2=x4ctvhgx...@mail.gm
>> ail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Just from the slides, it's not clear that Roger had a solution; the slides
>> seem to just suggest an approach. Am I missing something here?
>>
>> That is, he defined the invariants that all tautomers of a compound have
>> to
>> share and expressed it as a SMARTS + constraints; but I didn't see that he
>> provided a methodology to derive a canonical matching SMILES from a SMARTS
>> + constraints. True, if two structures match the SMARTS + constraints,
>> they
>> are likely tautomers. (I can't think of why they wouldn't be, but maybe
>> it's not always the case.) So that part provides deduplication of an input
>> stream, which is good, but no way to derive and store a canonical
>> representation.
>>
>> Again, perhaps I'm missing something, but if so, what?
>>
>> -P.
>>
>> On Tue, Apr 11, 2017 at 2:43 AM, MARIA BRANDL <m.bra...@btinternet.com>
>> wrote:
>>
>> > Dear all,
>> >
>> >
>> > Is there going to be an attempt at coding Roger Sayle's  "Alternative
>> > Approach" to tautomers described in
>> > RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
>> > <https://de.slideshare.net/NextMoveSoftware/rdkit-six-notsoe
>> asy-pieces-rdkit-ugm-2016> into
>> > RDKit ?
>> >
>> >
>> > I have managed to get reasonable tautomers out of Resonance.cpp using:
>> >
>> > suppl = rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CH
>> ARGE_SEPARATION
>> > | \
>> >   rdchem.ResonanceFlags.ALLOW_I
>> NCOMPLETE_OCTETS
>> > | \
>> >   rdchem.ResonanceFlags.UNCONST
>> RAINED_CATIONS
>> > | \
>> >   rdchem.ResonanceFlags.
>> > UNCONSTRAINED_ANIONS)
>> >
>> >  with some post-filtering for e.g. carbocations, but feel that it may be
>> > more efficient to put user defined constraints on each atom during the
>> > backtracking loops, as Roger suggests.
>> >
>> > Looking forward to hearing your thoughts on this.
>> >
>> > Best regards,
>> >
>> > Maria Brandl
>> >
>> > 
>> > --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> > ___
>> > Rdkit-discuss mailing list
>> > Rdkit-discuss@lists.sourceforge.net
>> > https://lis

Re: [Rdkit-discuss] tautomers in rdkit

2017-04-17 Thread JW Feng
Hi Maria,

>From looking at Roger's slides on https://github.com/rdkit/UGM_2
016/blob/master/Presentations/Sayle_RDKitTautomers.pdf.  Is he making an
argument that InChi values are insufficient in generating a canonical
string for different tautomers?  What if you perform a set of
standardization transformation prior to generating InChi values?  You may
want to look at how Genentech normalizes molecules for compound
registration. The code is based on OEChem and is open sourced on Github
https://github.com/chemalot/chemalot.  This package is actively being
developed and I am a contributor.  Specifically, you'll want to look at the
extensive standardization transformations in https://github.com/chemalot/ch
emalot/blob/master/src/com/genentech/struchk/oeStruchk/Struchk.xml

The last step in Struchk.xml is creating a canonical tautomer using
OpenEye's QuacPac toolkit.  QuacPac returns a canonical tautomer.  Could
one replace this step by converting a standardized molecule to InChi and
the back?  Another approach is using Dave Cosgrove's TautEnum package (
https://github.com/OpenEye-Contrib/TautEnum).  Both QuacPac and TautEnum
enumerates tautomers.  I believe that Roger is intimately familiar with
QuacPac

Best,

JW

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Tue, Apr 11, 2017 at 6:52 AM, <rdkit-discuss-request@lists.s
ourceforge.net> wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. tautomers in rdkit (MARIA BRANDL)
>2. Re: tautomers in rdkit (Peter S. Shenkin)
>3. official Tripos MOL2 file format PDF document (Francois BERENGER)
>
>
> --
>
> Message: 1
> Date: Tue, 11 Apr 2017 06:43:39 + (UTC)
> From: MARIA BRANDL <m.bra...@btinternet.com>
> Subject: [Rdkit-discuss] tautomers in rdkit
> To: "rdkit-discuss@lists.sourceforge.net"
> <rdkit-discuss@lists.sourceforge.net>
> Message-ID: <1522420730.263132.1491893019...@mail.yahoo.com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear all,
>
> Is there going to be an attempt at coding Roger Sayle's ?"Alternative
> Approach" to tautomers described inRDKit: Six Not-So-Easy Pieces [RDKit UGM
> 2016]?into RDKit ?
>
>
> I have managed to get reasonable tautomers out of Resonance.cpp using:
> suppl = 
> rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
> rdchem.ResonanceFlags.ALLOW_INCOMPLETE_OCTETS
> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
> rdchem.ResonanceFlags.UNCONSTRAINED_CATIONS
> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rdchem.ResonanceFlags.UNCONSTR
> AINED_ANIONS)
> ?with some post-filtering for e.g. carbocations, but feel that it may be
> more efficient to put user defined constraints on each atom during the
> backtracking loops, as Roger suggests.
> Looking forward to hearing your thoughts on this.
> Best regards,
> Maria Brandl
> ------ next part --
> An HTML attachment was scrubbed...
>
> --
>
> Message: 2
> Date: Tue, 11 Apr 2017 03:47:47 -0400
> From: "Peter S. Shenkin" <shen...@gmail.com>
> Subject: Re: [Rdkit-discuss] tautomers in rdkit
> To: MARIA BRANDL <m.bra...@btinternet.com>
> Cc: "rdkit-discuss@lists.sourceforge.net"
> <rdkit-discuss@lists.sourceforge.net>
> Message-ID:
> <CAAsqebH6gVRpm2rhhzv0-koWVr6P0WU+QK0EO2=x4ctvhgx...@mail.gm
> ail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Just from the slides, it's not clear that Roger had a solution; the slides
> seem to just suggest an approach. Am I missing something here?
>
> That is, he defined the invariants that all tautomers of a compound have to
> share and expressed it as a SMARTS + constraints; but I didn't see that he
> provided a methodology to derive a canonical matching SMILES from a SMARTS
> + constraints. True, if two structures match the SMARTS + constraints, they
> are likely tautomers. (I can't think of why they wouldn't be, but ma

Re: [Rdkit-discuss] tautomers in rdkit

2017-04-11 Thread Peter S. Shenkin
Just from the slides, it's not clear that Roger had a solution; the slides
seem to just suggest an approach. Am I missing something here?

That is, he defined the invariants that all tautomers of a compound have to
share and expressed it as a SMARTS + constraints; but I didn't see that he
provided a methodology to derive a canonical matching SMILES from a SMARTS
+ constraints. True, if two structures match the SMARTS + constraints, they
are likely tautomers. (I can't think of why they wouldn't be, but maybe
it's not always the case.) So that part provides deduplication of an input
stream, which is good, but no way to derive and store a canonical
representation.

Again, perhaps I'm missing something, but if so, what?

-P.

On Tue, Apr 11, 2017 at 2:43 AM, MARIA BRANDL 
wrote:

> Dear all,
>
>
> Is there going to be an attempt at coding Roger Sayle's  "Alternative
> Approach" to tautomers described in
> RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
> 
>  into
> RDKit ?
>
>
> I have managed to get reasonable tautomers out of Resonance.cpp using:
>
> suppl = 
> rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
> | \
>   
> rdchem.ResonanceFlags.ALLOW_INCOMPLETE_OCTETS
> | \
>   
> rdchem.ResonanceFlags.UNCONSTRAINED_CATIONS
> | \
>   rdchem.ResonanceFlags.
> UNCONSTRAINED_ANIONS)
>
>  with some post-filtering for e.g. carbocations, but feel that it may be
> more efficient to put user defined constraints on each atom during the
> backtracking loops, as Roger suggests.
>
> Looking forward to hearing your thoughts on this.
>
> Best regards,
>
> Maria Brandl
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss