Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread Jason Biggs
Start with your benzene molecule

m = Chem.MolFromSmiles('c1c1')


make a pattern using Peter's example, with three aromatic atoms connected
by three aromatic bonds

patt = Chem.MolFromSmarts('a:a:a')


and it's a match:

m.HasSubstructMatch(patt)

>True


Kekulize your mol, and the pattern doesn't match

Chem.rdmolops.Kekulize(m)
m.HasSubstructMatch(patt)
>False


but if you change the smarts pattern to match aromatic atoms connected by
kekulized bonds, it matches

patt2 = Chem.MolFromSmarts('[a]=[a]-[a]')
m.HasSubstructMatch(patt2)
>True

Your original SMARTS query doesn't match, because C in a smarts string is
specifically an aliphatic carbon.  Change it to c and it will match.  It
would work, if you had removed the aromatic flags when kekulizing


m = Chem.MolFromSmiles('c1c1')
Chem.rdmolops.Kekulize(m, clearAromaticFlags = True)
patt = Chem.MolFromSmarts('[C]=[C]-[C]')
m.HasSubstructMatch(patt)
>True



So when you kekulize, without using the clearAromaticFlags option, then
aromatic atoms will still only match 'a', not 'A', but the bonds will only
match '=' or '-', but not ':'  (they will also match '@' or '~', but that's
beside the point here)

As Peter mentions, by default if you read in a kekulized SMILES string, the
mol you create will not be kekulized, but it sounds like you are
intentionally kekulizing before doing substructure matching.



Jason Biggs


On Fri, Sep 8, 2017 at 5:19 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> Suppose I read in the SMILES of an aromatic molecule e.g., for
> benzene
>
> c1c1
>
> I then want to convert the molecule to a Kekule representation and
> then perform various SMARTS pattern recognition e.g.
>
> [C]=[C]-[C]
>
> I have tried various Kekule commands in RDkit, but I can not figure
> out how to (or if it is possible) to recognize a SMARTS pattern for
> a portion of a molecule which is aromatic, but is currently being
> stored as a Kekule structure.
>
> Also, is it possible to generate and store more than one Kekule
> form in RDkit?
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread Peter S. Shenkin
Hi,

In SMARTS, 'a' matches an aromatic atom. So you would match your molecule
with the pattern 'aaa', or if you wanted to restrict yourself to carbons,
'ccc'.

This would match whether you created the molecule from a Kekulized or an
aromatic SMILES. Remember that it's the molecular recognition code, not the
form of the input SMILES, that determines whether a molecule is aromatic.

-P.

On Fri, Sep 8, 2017 at 6:19 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> Suppose I read in the SMILES of an aromatic molecule e.g., for
> benzene
>
> c1c1
>
> I then want to convert the molecule to a Kekule representation and
> then perform various SMARTS pattern recognition e.g.
>
> [C]=[C]-[C]
>
> I have tried various Kekule commands in RDkit, but I can not figure
> out how to (or if it is possible) to recognize a SMARTS pattern for
> a portion of a molecule which is aromatic, but is currently being
> stored as a Kekule structure.
>
> Also, is it possible to generate and store more than one Kekule
> form in RDkit?
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread James T. Metz via Rdkit-discuss
Hello,


Suppose I read in the SMILES of an aromatic molecule e.g., for

benzene


c1c1



I then want to convert the molecule to a Kekule representation and

then perform various SMARTS pattern recognition e.g.


[C]=[C]-[C]



I have tried various Kekule commands in RDkit, but I can not figure

out how to (or if it is possible) to recognize a SMARTS pattern for
a portion of a molecule which is aromatic, but is currently being
stored as a Kekule structure.


Also, is it possible to generate and store more than one Kekule

form in RDkit?


Thank you.


Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss