Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Andrew Dalke
On Jul 23, 2021, at 01:01, Gustavo Seabra  wrote:
> I actually want the sulfone to be found, if it is there. My problem is that I 
> also want flexibility to change the ring atoms and still find the ring as a 
> match, while considering a match on the sulfone only if it really is there. 
> (e.g., CF3 should *not* match.) Does it make sense?

Ahh, I see.

No, there's no way to do that.

The best I can suggest is to go back to the original Python implementation and 
change the code leading up to

   https://hg.sr.ht/~dalke/fmcs/browse/fmcs.py?rev=tip#L1929

so the initial seed is the sulfone instead of an (atom, bond, atom).

Then use that to the the MCS with the sulfone, and if that fails, use RDKit's 
existing method.

I point to my repository only because that's in Python and I know it better. If 
your C++ skills are better than mine, you might change the corresponding 
implementation in RDKit.

Cheers,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Gustavo Seabra
Hi,

Thanks a lot for the reply! However, in this case, it looks like I would
have to somehow label the isotope in every query molecule, right? For
example:
```
template =
Chem.MolFromSmarts('[c]1(-[2S](=[3O])(=[3O])(-C)):[c]:[c]:[c]:[c]:[c]:1')
mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
compare = [template,mol1]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareIsotopes,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
```
returns: '[0*]1:[0*]:[0*]:[0*]:[0*]:[0*]:1', that is, it only picks the
ring but not the sulfone. I actually want the sulfone to be found, if it is
there. My problem is that I also want flexibility to change the ring atoms
and still find the ring as a match, while considering a match on the
sulfone only if it really is there. (e.g., CF3 should *not* match.) Does it
make sense?

Thanks a lot!
--
Gustavo Seabra.


On Thu, Jul 22, 2021 at 4:52 PM Andrew Dalke 
wrote:

> Hi Gustavo,
>
>
> > template =
> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
>
> Unless things have changed since I last looked at the algorithm, you can't
> meaningfully pass a SMARTS-based query molecule into the MCS program,
> outside of a few simple cases.
>
> It generates a SMARTS pattern based on the properties of the molecule. You
> asked it to CompareElements, but those [a] terms all have an atomic number
> of 0.
>
>   >>> template =
> Chem.MolFromSmarts('[a#1]1(-[S](-*)(=[O])=[O]):[a#1]:[a#1]:[a#1]:[a#1]:[a#1]:1')
>   >>> [a.GetAtomicNum() for a in template.GetAtoms()]
>   [0, 16, 0, 8, 8, 0, 0, 0, 0, 0]
>
> That's why your CompareAny search returns the #0 terms, like:
>
>
> '[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'
>
> > I would appreciate some pointers on how it would be possible to find the
> maximum common substructure of 2 molecules, where in the template structure
> some atoms may be *any*, but some other atoms must be fixed.
>
> Perhaps with isotope labelling?
>
> That is, label the "any" atoms as isotope 1, and label your
> -[S](=[O])(=[O])- as -[2S](=[3O])(=[3O])-
>
> Then use rdFMCS.AtomCompare.CompareIsotopes .
>
> If there's anything you don't want to match at all, give each atom a
> unique isotope value.
>
> Best regards,
>
> Andrew
> da...@dalkescientific.com
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Andrew Dalke
Hi Gustavo,


> template = 
> Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')

Unless things have changed since I last looked at the algorithm, you can't 
meaningfully pass a SMARTS-based query molecule into the MCS program, outside 
of a few simple cases.

It generates a SMARTS pattern based on the properties of the molecule. You 
asked it to CompareElements, but those [a] terms all have an atomic number of 0.

  >>> template = 
Chem.MolFromSmarts('[a#1]1(-[S](-*)(=[O])=[O]):[a#1]:[a#1]:[a#1]:[a#1]:[a#1]:1')
  >>> [a.GetAtomicNum() for a in template.GetAtoms()]
  [0, 16, 0, 8, 8, 0, 0, 0, 0, 0]

That's why your CompareAny search returns the #0 terms, like:

  
'[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'

> I would appreciate some pointers on how it would be possible to find the 
> maximum common substructure of 2 molecules, where in the template structure 
> some atoms may be *any*, but some other atoms must be fixed.

Perhaps with isotope labelling?

That is, label the "any" atoms as isotope 1, and label your -[S](=[O])(=[O])- 
as -[2S](=[3O])(=[3O])-

Then use rdFMCS.AtomCompare.CompareIsotopes .

If there's anything you don't want to match at all, give each atom a unique 
isotope value.

Best regards,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Gustavo Seabra
Hi all,,

I would appreciate some pointers on how it would be possible to find the
maximum common substructure of 2 molecules, where in the template structure
some atoms may be *any*, but some other atoms must be fixed.

Currently, I'm trying to use rdFMCS module. For example:

from rdkit import Chem
from rdkit.Chem import rdFMCS

template =
Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1')
# This should give a sulfone connected to an aromatic ring and
# some other (any) element. Notice that the ring may have
# any atoms (N,C,O), but for me it is important to have the SO2 group.

mol1 = Chem.MolFromSmiles('CS(=O)(=O)c1ccc(C2=C(c3c3)CCN2)cc1')
# This molecule has the pattern.

# Now, if I try to find a substructure match, I use:
compare = [template, mol1]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# gives: '[#16](=[#8])=[#8]'

# Notice that the only match is the SO2, it does not match the ring.
However, if I try that with another structure that has a CF3 in place of
the SO2, I get:
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [template,mol2]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# Returns: '' (empty string)

# if I change to AtomCompare.CompareAny, now a CF3 will also match
# in the SO2-X:
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [template,mol2]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareAny,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# Returns:
'[#16,#6](-[#0,#6])(=,-[#8,#9])(=,-[#8,#9])-[#0,#6]1:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#6]:[#0,#7]:1'

But now theCF3 is counted in place of the SO2. The result I'd like to get
here would be just the ring, as in the case:
new_template = Chem.MolFromSmarts('CS(=O)(=O)c1cnccc1')
mol2 = Chem.MolFromSmiles('Cc1ccc(C2=CCNC2c2ccc(C(C)(F)F)nc2)nn1')
compare = [new_template,mol2]
res = rdFMCS.FindMCS(compare,
atomCompare=rdFMCS.AtomCompare.CompareElements,
bondCompare=rdFMCS.BondCompare.CompareAny,
ringMatchesRingOnly=False,
completeRingsOnly=False)
res.smartsString
# Returns: '[#6]1:[#6]:[#7]:[#6]:[#6]:[#6]:1' (just the ring)

Notice that if I use CompareElements, there seems to be no way to match the
ring with either N or C.

Does anyone have a suggestion on how I can specify flexibility (similar to
AtomCompare.CompareAny) only for a portion of the molecule and still
enforce specific atoms in another portion?

Thank you so much!
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss