One more thing. The term "Mol" in RDKit and some other tookits does not
really mean "molecule" in the sense that chemists use it. It is used to
connote a data structure that can store a SMARTS or a SMILES. Only when a
SMILES is used does it really correspond to a chemical "molecule", except,
in some cases, by accident; and, as Andrew pointed out, there are cases
when exactly the same string means different things in a SMARTS and SMILES
context.

The way I think of it is that SMILES is like an ordinary string and SMARTS
is like a regex that can be used to flexibly match other strings.

-P.



On Wed, Apr 19, 2017 at 5:20 PM, Andrew Dalke <da...@dalkescientific.com>
wrote:

> On Apr 19, 2017, at 18:26, Curt Fischer <curt.r.fisc...@gmail.com> wrote:
> > From chemistry stack exchange, an answer contributed by user R.M.:
> >
> > SMARTS is deliberately designed to be a superset of SMILES. That is, any
> valid SMILES depiction should also be a valid SMARTS query, one that will
> retrieve the very structure that the SMILES string depicts.
>
> Except, that last clause isn't true. Try matching tritium against itself.
>
> >>> from rdkit import Chem
> >>> mol = Chem.MolFromSmiles("[3H]")
> >>> pat = Chem.MolFromSmarts("[3H]")
> >>> mol.HasSubstructMatch(pat)
> False
>
> For hydrogens you must use '#1', because H in SMARTS means something
> different.
>
> >>> pat2 = Chem.MolFromSmarts("[3#1]")
> >>> mol.HasSubstructMatch(pat2)
> True
>
> SMILES input under Daylight and most other toolkits gets normalized to the
> chemistry model, including aromaticity perception:
>
> >>> mol = Chem.MolFromSmiles("C1=CC=CC=C1")
> >>> pat = Chem.MolFromSmarts("C1=CC=CC=C1")
> >>> mol.HasSubstructMatch(pat)
> False
> >>> pat2 = Chem.MolFromSmarts("c1ccccc1")
> >>> mol.HasSubstructMatch(pat2)
> True
>
> RDKit also does a small amount of additional normalization, or
> 'sanitization' to use the RDKit term. For example, it will convert "neutral
> 5 coordinate Ns with double bonds to Os to the zwitterionic form" (see
> GraphMol/MolOps.cpp):
>
> >>> s = "CN(=O)=O"
> >>> mol = Chem.MolFromSmiles(s)
> >>> pat = Chem.MolFromSmarts(s)
> >>> mol.HasSubstructMatch(pat)
> False
> >>> Chem.MolToSmiles(mol)
> 'C[N+](=O)[O-]'
>
> I believe that the output SMILES from a toolkit, assuming that the SMILES
> doesn't have an explicit hydrogen, can be used a SMARTS which will match
> the molecule made from that same SMILES, by that same toolkit.
>
> This is a weaker statement than that made by user R.M.
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to