On Thu, May 4, 2017 at 10:22 PM, Rafal Roszak <rmrmg.c...@gmail.com> wrote:

> Greg,
>
> Thx for answer, but it is still unclear for me. You wrote:
>
> > So, the short answer to your question is: "yes". You will need to rewrite
> > your SMARTS queries so that nitro groups are expressed consistently. If
> you
> > want those queries to match what the RDKit's molecule processing code
> > produces, the nitro's themselves should be written '[N+](=O)[O-]'
>
> so:
>
> this three Mols (m1, m2, m33) below should all represent nitrobenzene,
> right?
> >>> m33=Chem.MolFromSmarts('c1ccccc1[N+](=O)[O-]')
> >>> m1=Chem.MolFromSmiles('c1ccccc1[N+](=O)[O-]')
> >>> m2=Chem.MolFromSmiles('c1ccccc1N(=O)(=O)')
>

Not quite.
As Curt already pointed out, m33 is a query that will retrieve nitrobenzene
as well as an infinity of other molecules which contain that substructure

One important point that doesn't get surfaced in the rest of this: m1 and
m2 end up being identical to each other. The sanitization that the RDKit
applies when reading in molecules from SMILES ends up making them identical.


> but
>
> 1. first strange thing is smiles of m33:
> >>> Chem.MolToSmiles(m33)
> 'O=N(O)c1ccccc1'
>
> this not look like nitrobenzene for me
>

It's not. MolFromSmarts() returns a molecule that contains query features.
Translating those query features back into SMILES is an inexact process;
you are seeing that here.

2. results of HasSubstructMatch is really unexpected:
>
> >>> m2.HasSubstructMatch(m33)
> True
> >>> m1.HasSubstructMatch(m33)
> True
> >>> m33.HasSubstructMatch(m1)
> False
> >>> m33.HasSubstructMatch(m2)
> False
> >>>


> m1, m2 is substruct of m33 but m33 is not substuct of m1 or m2. I
> really dont understand this.
>

This is expected, but I can see how it's surprising at first.
Here's an overly simplified partial explanation, I hope it helps:
When doing a substructure match mol.HasSubstructMatch(query) the code needs
to determine if each atom/bond in query matches each atom/bond in mol. To
this end it calls functions like : mol_atom.Match(query_atom) and
mol_bond.Match(query_bond) for each possible atom and bond pair (it's
actually a bit smarter than that).

"normal" atoms (those from SMILES) or "Query" atoms (those from SMARTS) can
match "normal" atoms, but "Query" atoms do not match "normal" atoms. The
same is true of bonds. So a molecule built from SMARTS can be a
substructure of a molecule build from SMILES, but the reverse is not
normally the case.

It seems this is problem with smarts mol:
> >>> m33.HasSubstructMatch(m33)
> False
> >>>
>
> Is it really correct behaviour?
>

Yes, if you want the RDKit to try and match query atoms to each other you
need to include an additional argument:

In [7]: m33=Chem.MolFromSmarts('c1ccccc1[N+](=O)[O-]')

In [8]: m33.HasSubstructMatch(m33)
Out[8]: False

In [9]: m33.HasSubstructMatch(m33,useQueryQueryMatches=True)
Out[9]: True

This is an approximate process (which is why it's not enabled by default),
but it does often work.

Best,
-greg
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to