Re: [Rdkit-discuss] atom equivalence for substructure matching

Greg Landrum Wed, 30 Oct 2013 06:09:50 -0700

Hi Ling,

On Wed, Oct 30, 2013 at 2:12 AM, S.L. Chan <slch...@yahoo.com> wrote:


> Good evening,
>
> I would like to get an exhaustive substructure matching of a molecule onto
> itself. Generally I could use the GetSubstructMatches function with the
> "uniquify=False" option. However, if there is a carboxylate or a
> guanidinium head around, this would give only "one side" of the match since
> the two oxygens / nitrogens are not considered equivalent:
>
> >>> mol = Chem.MolFromSmiles('CC(=O)[O-]')
> >>> patt = Chem.MolFromSmarts('CC(=O)[O-]')
> >>> print mol.GetSubstructMatches(patt,uniquify=False)
> ((0,1,2,3),)
>
> Now, I suppose I could do an ugly (could in principle match two single
> bonds) hack to achieve my purpose:
> >>> mol = Chem.MolFromSmiles('CC(=O)[O-]')
> >>> patt = Chem.MolFromSmarts('CC(~O)~O')
> >>> print mol.GetSubstructMatches(patt,uniquify=False)
> ((0,1,2,3), (0,1,3,2))
>
> However, this would mean that I would need to manually edit the smarts
> string for all molecules. I just wonder if there is something similar to
> the "Kekulize" command that would make the two oxygens equivalent? Or are
> there other ways around this?
>

This is an interesting question.

There's no super-easy way that I can think of to get what you want, but
there is an approach that will probably work.

What you can do is edit the molecule to replace the substructures in
question with something that gives the appropriate matching behavior.
Here's one way of doing that which preserves atom types:

In [17]: repl = Chem.MolFromSmiles('C(O)O')
In [18]: repl.GetBondWithIdx(0).SetBondType(Chem.BondType.ONEANDAHALF)
In [19]: repl.GetBondWithIdx(1).SetBondType(Chem.BondType.ONEANDAHALF)
In [20]: m = Chem.MolFromSmiles('CC(C(=O)O)C(C(=O)O)C')
In [21]: m.GetSubstructMatches(m,uniquify=False)
Out[21]: ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (9, 5, 6, 7, 8, 1, 2, 3, 4, 0))
In [25]: nm =
Chem.ReplaceSubstructs(m,Chem.MolFromSmarts('C(=O)[OH,O-]'),repl,replaceAll=True)
In [28]: nm[0].GetSubstructMatches(nm[0],uniquify=False)
Out[28]:
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
 (0, 1, 2, 3, 4, 5, 6, 7, 9, 8),
 (0, 1, 2, 3, 4, 6, 5, 7, 8, 9),
 (0, 1, 2, 3, 4, 6, 5, 7, 9, 8),
 (3, 2, 1, 0, 7, 8, 9, 4, 5, 6),
 (3, 2, 1, 0, 7, 8, 9, 4, 6, 5),
 (3, 2, 1, 0, 7, 9, 8, 4, 5, 6),
 (3, 2, 1, 0, 7, 9, 8, 4, 6, 5))


Note that the problem with this is that it changes the atom numbering. If
you want to preserve atom numbering, it's a bit more complex:


In [45]: q = Chem.MolFromSmarts('C(=O)[OH,O-]')
In [46]: m = Chem.MolFromSmiles('CC(C(=O)O)C(C(=O)O)C')
In [48]: qmatch = m.GetSubstructMatches(q)
In [50]: for match in qmatch:
    b = m.GetBondBetweenAtoms(match[0],match[1])
    b.SetBondType(Chem.BondType.ONEANDAHALF)
    b = m.GetBondBetweenAtoms(match[0],match[2])
    b.SetBondType(Chem.BondType.ONEANDAHALF)
    m.GetAtomWithIdx(match[2]).SetFormalCharge(0)
    m.GetAtomWithIdx(match[2]).SetNoImplicit(False)
    m.GetAtomWithIdx(match[2]).SetNumExplicitHs(0)

In [52]: m.GetSubstructMatches(m,uniquify=False)
Out[52]:
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
 (0, 1, 2, 3, 4, 5, 6, 8, 7, 9),
 (0, 1, 2, 4, 3, 5, 6, 7, 8, 9),
 (0, 1, 2, 4, 3, 5, 6, 8, 7, 9),
 (9, 5, 6, 7, 8, 1, 2, 3, 4, 0),
 (9, 5, 6, 7, 8, 1, 2, 4, 3, 0),
 (9, 5, 6, 8, 7, 1, 2, 3, 4, 0),
 (9, 5, 6, 8, 7, 1, 2, 4, 3, 0))

In all the above I'm showing how to solve the problem for carboxyls.
Handling other groups is left as an exercise to the reader. ;-)

Is that doing what you're looking for?
-greg

------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] atom equivalence for substructure matching

Reply via email to