On Apr 19, 2017, at 18:26, Curt Fischer <curt.r.fisc...@gmail.com> wrote:
> From chemistry stack exchange, an answer contributed by user R.M.:
> SMARTS is deliberately designed to be a superset of SMILES. That is, any 
> valid SMILES depiction should also be a valid SMARTS query, one that will 
> retrieve the very structure that the SMILES string depicts.

Except, that last clause isn't true. Try matching tritium against itself.

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("[3H]")
>>> pat = Chem.MolFromSmarts("[3H]")
>>> mol.HasSubstructMatch(pat)

For hydrogens you must use '#1', because H in SMARTS means something different.

>>> pat2 = Chem.MolFromSmarts("[3#1]")
>>> mol.HasSubstructMatch(pat2)

SMILES input under Daylight and most other toolkits gets normalized to the 
chemistry model, including aromaticity perception:

>>> mol = Chem.MolFromSmiles("C1=CC=CC=C1")
>>> pat = Chem.MolFromSmarts("C1=CC=CC=C1")
>>> mol.HasSubstructMatch(pat)
>>> pat2 = Chem.MolFromSmarts("c1ccccc1")
>>> mol.HasSubstructMatch(pat2)

RDKit also does a small amount of additional normalization, or 'sanitization' 
to use the RDKit term. For example, it will convert "neutral 5 coordinate Ns 
with double bonds to Os to the zwitterionic form" (see GraphMol/MolOps.cpp):

>>> s = "CN(=O)=O"
>>> mol = Chem.MolFromSmiles(s)
>>> pat = Chem.MolFromSmarts(s)
>>> mol.HasSubstructMatch(pat)
>>> Chem.MolToSmiles(mol)

I believe that the output SMILES from a toolkit, assuming that the SMILES 
doesn't have an explicit hydrogen, can be used a SMARTS which will match the 
molecule made from that same SMILES, by that same toolkit.

This is a weaker statement than that made by user R.M.


Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list

Reply via email to