Re: [Rdkit-discuss] RDKit Reaction gives disconnected components

2017-03-30 Thread Brian Kelley
Correction here, you are not making two products because you are grouping
the results ala:

>>> rxn = AllChem.ReactionFromSmarts("([C:1][*][N:2])>>([C:1].[N:2])")

>>> prods = rxn.RunReactants([Chem.MolFromSmiles("FC1ON1I")])

>>> Chem.MolToSmiles(prods[0][0])

'CF.NI'

However, it appears that you aren't mapping anything explicitly between
[C:1] and [N:2] in some cases so the left hand side doesn't know what
really to do.

I'll have to dig into this a little more.

Cheers,
 Brian


On Thu, Mar 30, 2017 at 12:56 PM, Brian Kelley  wrote:

> I have a feeling you may need to make two reactions.  Let's consider a
> dirt simple case:
>
> >>> rxn = AllChem.ReactionFromSmarts("[C:1][N:2]>>[C:1].[N:2]")
>
> >>> prods = rxn.RunReactants([Chem.MolFromSmiles("CN")])
>
> >>> Chem.MolToSmiles(prods[0][0])
>
> 'C'
>
> >>> Chem.MolToSmiles(prods[0][1])
>
> 'N'
>
> >>>
>
> Note that this reaction is explicitly breaking a bond.  I think this is
> what you are seeing with your example.
>
> Note that similar to the "." on the reagent side meaning multiple
> reagents, the "." on the right hand side means there will be multiple
> products.
>
> Does this help at all?
>
> Cheers,
>  Brian
>
> On Thu, Mar 30, 2017 at 12:07 PM, Stephen Roughley <
> s.rough...@vernalis.com> wrote:
>
>> Dear Greg/RDKitters,
>>
>>
>>
>> This may be user error, or misunderstanding of rSMARTS, so can anyone
>> throw some light on the following behaviour?
>>
>>
>>
>> First example works as expected – there are 2× Ph in m4, so we end up
>> with 2×2×2 copies of the expected product:
>>
>>
>>
>> rSMARTS4='([*:1]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H
>> 0]:1.[*:2]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:1)>
>> >([*:1]-!@c:1:c:c(-F):c:c:c1.[*:2]-!@c:1:c:c(-F):c:c:c1)' #Replace 2×
>> Ph-* with 2× 3-Fl-C6H4-*
>>
>> rxn4=AllChem.ReactionFromSmarts(rSMARTS4)
>>
>> rxn4
>>
>> m4=Chem.MolFromSmiles('c1c1CCOCc1c1')
>>
>> m4
>>
>> prodsbi=rxn4.RunReactants((m4,))
>>
>> for prod in prodsbi:
>>
>> Chem.SanitizeMol(prod[0])
>>
>> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=4,
>> subImgSize=(200,200))
>>
>>
>>
>> Now consider the following – the only difference I can think of is that
>> the [*:1] and [*:2] atoms map to adjacent, directly bonded atoms – I cant
>> see why that should matter…
>>
>>
>>
>> m3=Chem.MolFromSmiles('c1c1COc1c1')
>>
>> m3
>>
>> prodsbi=rxn4.RunReactants((m3,))
>>
>> for prod in prodsbi:
>>
>> Chem.SanitizeMol(prod[0])
>>
>> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=8,
>> subImgSize=(200,200))
>>
>>
>>
>> Just to be sure this is as I think it looks..
>>
>> prodsbi[0][0]
>>
>>
>>
>> Any suggestions as to why this happens, and whether it is the expected
>> behaviour? (And how to avoid it?!)
>>
>> Thanks,
>>
>> Steve
>>
>>
>>
>>
>>
>>
>>
>> __
>> PLEASE READ: This email is confidential and may be privileged. It is
>> intended for the named addressee(s) only and access to it by anyone else is
>> unauthorised. If you are not an addressee, any disclosure or copying of the
>> contents of this email or any action taken (or not taken) in reliance on it
>> is unauthorised and may be unlawful. If you have received this email in
>> error, please notify the sender or postmas...@vernalis.com. Email is not
>> a secure method of communication and the Company cannot accept
>> responsibility for the accuracy or completeness of this message or any
>> attachment(s). Please check this email for virus infection for which the
>> Company accepts no responsibility. If verification of this email is sought
>> then please request a hard copy. Unless otherwise stated, any views or
>> opinions presented are solely those of the author and do not represent
>> those of the Company.
>>
>> The Vernalis Group of Companies
>> 100 Berkshire Place
>> Wharfedale Road
>> Winnersh, Berkshire
>> RG41 5RD, England
>> Tel: +44 (0)118 938  <+44%20118%20938%20>
>>
>> To access trading company registration and address details, please go to
>> the Vernalis website at www.vernalis.com and click on the "Company
>> address and registration details" link at the bottom of the page..
>> __
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list

Re: [Rdkit-discuss] RDKit Reaction gives disconnected components

2017-03-30 Thread Brian Kelley
I have a feeling you may need to make two reactions.  Let's consider a dirt
simple case:

>>> rxn = AllChem.ReactionFromSmarts("[C:1][N:2]>>[C:1].[N:2]")

>>> prods = rxn.RunReactants([Chem.MolFromSmiles("CN")])

>>> Chem.MolToSmiles(prods[0][0])

'C'

>>> Chem.MolToSmiles(prods[0][1])

'N'

>>>

Note that this reaction is explicitly breaking a bond.  I think this is
what you are seeing with your example.

Note that similar to the "." on the reagent side meaning multiple reagents,
the "." on the right hand side means there will be multiple products.

Does this help at all?

Cheers,
 Brian

On Thu, Mar 30, 2017 at 12:07 PM, Stephen Roughley 
wrote:

> Dear Greg/RDKitters,
>
>
>
> This may be user error, or misunderstanding of rSMARTS, so can anyone
> throw some light on the following behaviour?
>
>
>
> First example works as expected – there are 2× Ph in m4, so we end up with
> 2×2×2 copies of the expected product:
>
>
>
> rSMARTS4='([*:1]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!
> H0]:1.[*:2]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:1)
> >>([*:1]-!@c:1:c:c(-F):c:c:c1.[*:2]-!@c:1:c:c(-F):c:c:c1)' #Replace 2×
> Ph-* with 2× 3-Fl-C6H4-*
>
> rxn4=AllChem.ReactionFromSmarts(rSMARTS4)
>
> rxn4
>
> m4=Chem.MolFromSmiles('c1c1CCOCc1c1')
>
> m4
>
> prodsbi=rxn4.RunReactants((m4,))
>
> for prod in prodsbi:
>
> Chem.SanitizeMol(prod[0])
>
> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=4,
> subImgSize=(200,200))
>
>
>
> Now consider the following – the only difference I can think of is that
> the [*:1] and [*:2] atoms map to adjacent, directly bonded atoms – I cant
> see why that should matter…
>
>
>
> m3=Chem.MolFromSmiles('c1c1COc1c1')
>
> m3
>
> prodsbi=rxn4.RunReactants((m3,))
>
> for prod in prodsbi:
>
> Chem.SanitizeMol(prod[0])
>
> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=8,
> subImgSize=(200,200))
>
>
>
> Just to be sure this is as I think it looks..
>
> prodsbi[0][0]
>
>
>
> Any suggestions as to why this happens, and whether it is the expected
> behaviour? (And how to avoid it?!)
>
> Thanks,
>
> Steve
>
>
>
>
>
>
>
> __
> PLEASE READ: This email is confidential and may be privileged. It is
> intended for the named addressee(s) only and access to it by anyone else is
> unauthorised. If you are not an addressee, any disclosure or copying of the
> contents of this email or any action taken (or not taken) in reliance on it
> is unauthorised and may be unlawful. If you have received this email in
> error, please notify the sender or postmas...@vernalis.com. Email is not
> a secure method of communication and the Company cannot accept
> responsibility for the accuracy or completeness of this message or any
> attachment(s). Please check this email for virus infection for which the
> Company accepts no responsibility. If verification of this email is sought
> then please request a hard copy. Unless otherwise stated, any views or
> opinions presented are solely those of the author and do not represent
> those of the Company.
>
> The Vernalis Group of Companies
> 100 Berkshire Place
> Wharfedale Road
> Winnersh, Berkshire
> RG41 5RD, England
> Tel: +44 (0)118 938  <+44%20118%20938%20>
>
> To access trading company registration and address details, please go to
> the Vernalis website at www.vernalis.com and click on the "Company
> address and registration details" link at the bottom of the page..
> __
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit Reaction gives disconnected components

2017-03-30 Thread Stephen Roughley
Dear Greg/RDKitters,

This may be user error, or misunderstanding of rSMARTS, so can anyone throw 
some light on the following behaviour?

First example works as expected - there are 2× Ph in m4, so we end up with 
2×2×2 copies of the expected product:

rSMARTS4='([*:1]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:1.[*:2]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:1)>>([*:1]-!@c:1:c:c(-F):c:c:c1.[*:2]-!@c:1:c:c(-F):c:c:c1)'
 #Replace 2× Ph-* with 2× 3-Fl-C6H4-*
rxn4=AllChem.ReactionFromSmarts(rSMARTS4)
rxn4
[cid:image001.png@01D2A957.14773010]
m4=Chem.MolFromSmiles('c1c1CCOCc1c1')
m4
[cid:image002.png@01D2A957.14773010]
prodsbi=rxn4.RunReactants((m4,))
for prod in prodsbi:
Chem.SanitizeMol(prod[0])
Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=4, 
subImgSize=(200,200))
[cid:image003.png@01D2A957.14773010]

Now consider the following - the only difference I can think of is that the 
[*:1] and [*:2] atoms map to adjacent, directly bonded atoms - I cant see why 
that should matter...

m3=Chem.MolFromSmiles('c1c1COc1c1')
m3
[cid:image004.png@01D2A957.14773010]
prodsbi=rxn4.RunReactants((m3,))
for prod in prodsbi:
Chem.SanitizeMol(prod[0])
Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=8, 
subImgSize=(200,200))
[cid:image005.png@01D2A957.7D27E0A0]

Just to be sure this is as I think it looks..
prodsbi[0][0]
[cid:image006.png@01D2A957.7D27E0A0]

Any suggestions as to why this happens, and whether it is the expected 
behaviour? (And how to avoid it?!)
Thanks,
Steve




__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Reaction gives disconnected components

2017-03-30 Thread Stephen Roughley
Thanks Brian.

I think you have got to what I am trying to do - basically, trying to replace 2 
identical groups in 2 separate parts of the molecule with the same second 
group, hence the component grouping in reagents and products.  It works fine 
until the corner case shown. (Actually, it also fails if the dummy atoms [*:1] 
and [*:2] are required to match the same atom, as they would in e.g. 
c1c1Oc1c1, but that makes at least some more sense to me!)

One possible workaround is to do the same reaction iteratively, to replace each 
group in turn.  That would work in this case, but in a case where the reagent 
will match the product, it will give the wrong products, e.g. the transform:

[C;H3]-!@[*:1]>>C-C-[*:1]

I'm guessing this is probably a limitation of the rSMARTS definition, where the 
reaction products need to be something intermediate between SMARTS and SMILES.

Steve

From: Brian Kelley [fustiga...@gmail.com]
Sent: 30 March 2017 17:59
To: Stephen Roughley
Cc: RDKit Discuss (rdkit-discuss@lists.sourceforge.net)
Subject: Re: [Rdkit-discuss] RDKit Reaction gives disconnected components

Correction here, you are not making two products because you are grouping the 
results ala:


>>> rxn = AllChem.ReactionFromSmarts("([C:1][*][N:2])>>([C:1].[N:2])")

>>> prods = rxn.RunReactants([Chem.MolFromSmiles("FC1ON1I")])

>>> Chem.MolToSmiles(prods[0][0])

'CF.NI'

However, it appears that you aren't mapping anything explicitly between [C:1] 
and [N:2] in some cases so the left hand side doesn't know what really to do.

I'll have to dig into this a little more.

Cheers,
 Brian


On Thu, Mar 30, 2017 at 12:56 PM, Brian Kelley 
> wrote:
I have a feeling you may need to make two reactions.  Let's consider a dirt 
simple case:


>>> rxn = AllChem.ReactionFromSmarts("[C:1][N:2]>>[C:1].[N:2]")

>>> prods = rxn.RunReactants([Chem.MolFromSmiles("CN")])

>>> Chem.MolToSmiles(prods[0][0])

'C'

>>> Chem.MolToSmiles(prods[0][1])

'N'

>>>

Note that this reaction is explicitly breaking a bond.  I think this is what 
you are seeing with your example.

Note that similar to the "." on the reagent side meaning multiple reagents, the 
"." on the right hand side means there will be multiple products.

Does this help at all?

Cheers,
 Brian

On Thu, Mar 30, 2017 at 12:07 PM, Stephen Roughley 
> wrote:
Dear Greg/RDKitters,

This may be user error, or misunderstanding of rSMARTS, so can anyone throw 
some light on the following behaviour?

First example works as expected – there are 2× Ph in m4, so we end up with 
2×2×2 copies of the expected product:

rSMARTS4='([*:1]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:1.[*:2]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:1)>>([*:1]-!@c:1:c:c(-F):c:c:c1.[*:2]-!@c:1:c:c(-F):c:c:c1)'
 #Replace 2× Ph-* with 2× 3-Fl-C6H4-*
rxn4=AllChem.ReactionFromSmarts(rSMARTS4)
rxn4
[cid:image001.png@01D2A957.14773010]
m4=Chem.MolFromSmiles('c1c1CCOCc1c1')
m4
[cid:image002.png@01D2A957.14773010]
prodsbi=rxn4.RunReactants((m4,))
for prod in prodsbi:
Chem.SanitizeMol(prod[0])
Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=4, 
subImgSize=(200,200))
[cid:image003.png@01D2A957.14773010]

Now consider the following – the only difference I can think of is that the 
[*:1] and [*:2] atoms map to adjacent, directly bonded atoms – I cant see why 
that should matter…

m3=Chem.MolFromSmiles('c1c1COc1c1')
m3
[cid:image004.png@01D2A957.14773010]
prodsbi=rxn4.RunReactants((m3,))
for prod in prodsbi:
Chem.SanitizeMol(prod[0])
Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=8, 
subImgSize=(200,200))
[cid:image005.png@01D2A957.7D27E0A0]

Just to be sure this is as I think it looks..
prodsbi[0][0]
[cid:image006.png@01D2A957.7D27E0A0]

Any suggestions as to why this happens, and whether it is the expected 
behaviour? (And how to avoid it?!)
Thanks,
Steve




__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or 
postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not 

Re: [Rdkit-discuss] RDKit Reaction gives disconnected components

2017-03-30 Thread Greg Landrum
I believe that this is a bug, but it may be tricky to fix due to an
interaction with another feature.

Here's another simple example:
https://gist.github.com/greglandrum/a44f692bda8d110df309e561ea35f364

Input 6 is, analogous to what's happening to you Steve.

The normal process when running a reaction is to initialize a product
molecule with atoms from the product template and then to add the remaining
atoms/bonds from the reactant that are reachable from those matching atoms.
If there is a bond between matching atoms in the reactant but not in the
product template it will not be added to the product. This allows ring
opening reactions to happen (see input 10).

I think there's a way around this by adding bonds in the product if the
matching atoms are bonded in the reactant but not in the reactant template,
but I'm going to have to try it and see what the consequences are.

Here's the github issue:
https://github.com/rdkit/rdkit/issues/1387

-greg




On Thu, Mar 30, 2017 at 11:03 PM, Stephen Roughley 
wrote:

> Thanks Brian.
>
> I think you have got to what I am trying to do - basically, trying to
> replace 2 identical groups in 2 separate parts of the molecule with the
> same second group, hence the component grouping in reagents and products.
> It works fine until the corner case shown. (Actually, it also fails if the
> dummy atoms [*:1] and [*:2] are required to match the same atom, as they
> would in e.g. c1c1Oc1c1, but that makes at least some more sense to
> me!)
>
> One possible workaround is to do the same reaction iteratively, to replace
> each group in turn.  That would work in this case, but in a case where the
> reagent will match the product, it will give the wrong products, e.g. the
> transform:
>
> [C;H3]-!@[*:1]>>C-C-[*:1]
>
> I'm guessing this is probably a limitation of the rSMARTS definition,
> where the reaction products need to be something intermediate between
> SMARTS and SMILES.
>
> Steve
> --
> *From:* Brian Kelley [fustiga...@gmail.com]
> *Sent:* 30 March 2017 17:59
> *To:* Stephen Roughley
> *Cc:* RDKit Discuss (rdkit-discuss@lists.sourceforge.net)
> *Subject:* Re: [Rdkit-discuss] RDKit Reaction gives disconnected
> components
>
> Correction here, you are not making two products because you are grouping
> the results ala:
>
> >>> rxn = AllChem.ReactionFromSmarts("([C:1][*][N:2])>>([C:1].[N:2])")
>
> >>> prods = rxn.RunReactants([Chem.MolFromSmiles("FC1ON1I")])
>
> >>> Chem.MolToSmiles(prods[0][0])
>
> 'CF.NI'
>
> However, it appears that you aren't mapping anything explicitly between
> [C:1] and [N:2] in some cases so the left hand side doesn't know what
> really to do.
>
> I'll have to dig into this a little more.
>
> Cheers,
>  Brian
>
>
> On Thu, Mar 30, 2017 at 12:56 PM, Brian Kelley 
> wrote:
>
>> I have a feeling you may need to make two reactions.  Let's consider a
>> dirt simple case:
>>
>> >>> rxn = AllChem.ReactionFromSmarts("[C:1][N:2]>>[C:1].[N:2]")
>>
>> >>> prods = rxn.RunReactants([Chem.MolFromSmiles("CN")])
>>
>> >>> Chem.MolToSmiles(prods[0][0])
>>
>> 'C'
>>
>> >>> Chem.MolToSmiles(prods[0][1])
>>
>> 'N'
>>
>> >>>
>>
>> Note that this reaction is explicitly breaking a bond.  I think this is
>> what you are seeing with your example.
>>
>> Note that similar to the "." on the reagent side meaning multiple
>> reagents, the "." on the right hand side means there will be multiple
>> products.
>>
>> Does this help at all?
>>
>> Cheers,
>>  Brian
>>
>> On Thu, Mar 30, 2017 at 12:07 PM, Stephen Roughley <
>> s.rough...@vernalis.com> wrote:
>>
>>> Dear Greg/RDKitters,
>>>
>>>
>>>
>>> This may be user error, or misunderstanding of rSMARTS, so can anyone
>>> throw some light on the following behaviour?
>>>
>>>
>>>
>>> First example works as expected – there are 2× Ph in m4, so we end up
>>> with 2×2×2 copies of the expected product:
>>>
>>>
>>>
>>> rSMARTS4='([*:1]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H
>>> 0]:1.[*:2]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:1)>
>>> >([*:1]-!@c:1:c:c(-F):c:c:c1.[*:2]-!@c:1:c:c(-F):c:c:c1)
>>> ' #Replace 2× Ph-* with 2× 3-Fl-C6H4-*
>>>
>>> rxn4=AllChem.ReactionFromSmarts(rSMARTS4)
>>>
>>> rxn4
>>>
>>> m4=Chem.MolFromSmiles('c1c1CCOCc1c1')
>>>
>>> m4
>>>
>>> prodsbi=rxn4.RunReactants((m4,))
>>>
>>> for prod in prodsbi:
>>>
>>> Chem.SanitizeMol(prod[0])
>>>
>>> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=4,
>>> subImgSize=(200,200))
>>>
>>>
>>>
>>> Now consider the following – the only difference I can think of is that
>>> the [*:1] and [*:2] atoms map to adjacent, directly bonded atoms – I cant
>>> see why that should matter…
>>>
>>>
>>>
>>> m3=Chem.MolFromSmiles('c1c1COc1c1')
>>>
>>> m3
>>>
>>> prodsbi=rxn4.RunReactants((m3,))
>>>
>>> for prod in prodsbi:
>>>
>>> Chem.SanitizeMol(prod[0])
>>>
>>> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=8,
>>> 

[Rdkit-discuss] DeleteSubstructs vs ReplaceSubstructs

2017-03-30 Thread Popov, Maxim (Ext)
Dear RDKit users,

I am trying to remove a common substructure from a number of molecules (with 
AllChem.DeleteSubstructs). My problem is best illustrated by this short code:


from rdkit import Chem
from rdkit.Chem import AllChem

m=Chem.MolFromSmiles('C1(C2=NC=CC=C2)=CC=CC(C)=C1')
ss = Chem.MolFromSmiles('C1=CC=CC(C)=C1')
hyd=Chem.MolFromSmiles('[H]')
print("Substituting substructure with hydrogen")
frags = AllChem.ReplaceSubstructs(m, ss,hyd)
for frag in frags:
print(Chem.MolToSmiles(frag))
print("\nDeleting substructure")
frag = AllChem.DeleteSubstructs(m, ss)
print(Chem.MolToSmiles(frag))

I create a toluene connected to pyridine and try to remove toluene.

When replacing toluene substructure with hydrogen (AllChem.ReplaceSubstructs), 
I receive two sets of results: pyridine (with explicit hydrogen) and single 
carbon plus single hydrogen plus aromatic open chain (what is left from 
pyridine after removing one ring atom).

When deleting the toluene substructure (AllChem.DeleteSubstructs), I receive 
just the open chain of ex-pyridine (corresponding to second set of the 
ReplaceSubstructs results).

Is there a way of directing DeleteSubstructs method to a specific variant (in 
this case, leaving pyridine as a ring seems to be logical).

Best regards,

Maxim
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss