On Apr 19, 2017, at 23:59, Peter S. Shenkin <shen...@gmail.com> wrote:
> One more thing. The term "Mol" in RDKit and some other tookits does not 
> really mean "molecule" in the sense that chemists use it.

? I don't see how this is connected to the previous emails.

I believe most toolkits use that terminology in their APIs. (Daylight, OEChem, 
Open Babel, RDKit, Indigo, JChem, and InChI).

I know that VMD does that too, and I believe PyMol and RasMol as well.

There is a minority of software which use other terms. CACTVS calls it a 
'molecular ensemble'. CDK an 'atom container' (though I see people assign it to 
variables with 'm' or 'mol' in it).

I haven't really run into people who found this to be an issue, so I've stopped 
bringing it up in my documentation or when I teach. I mostly work with 
computational chemists, and that bias may affect things.

But this current thread is a discussion between computational people, which is 
why I don't understand the relevancy.


> The way I think of it is that SMILES is like an ordinary string and SMARTS is 
> like a regex that can be used to flexibly match other strings.

I think this is a reasonable approximation for computer programmers. I modeled 
my PyDaylight wrapper on top of the Daylight toolkit using this view.

Then Greg and RDKit showed me that that view was narrower than need be. In 
RDKit, a molecule can also be used as a subgraph.

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("c1ccccc1")
>>> from rdkit import Chem
>>> mol1 = Chem.MolFromSmiles("c1ccccc1")
>>> mol2 = Chem.MolFromSmiles("c1ccccc1O")
>>> mol2.HasSubstructMatch(mol1)
True
>>> mol1.HasSubstructMatch(mol2)
False

Stretching your analogy, this would be like a substring search rather than a 
regexp.

It's a difficult stretch because substring search has different performance 
characteristics to regexp search, while subgraph search is NP-complete even 
when only a simple SMILES is used to define the subgraph.

Alternatively, it could be like using a constrained glob pattern language 
instead of a more flexible regular expression. Well, except that SMILES as a 
pattern language has no flexibility for conjunction, disjunction, or repetition.


Furthermore, in RDKit a SMARTS pattern can (to a limited extent) be used to 
match a SMARTS pattern:

>>> pat1 = Chem.MolFromSmarts("[#7]=[#6]-[#8]")
>>> pat2 = Chem.MolFromSmarts("[#7]=[#8]")
>>> pat1.HasSubstructMatch(pat2)
False
>>> pat3 = Chem.MolFromSmarts("[#6]=[#7]")
>>> pat1.HasSubstructMatch(pat3)
True

I've used this once in my work when I generated simple subgraph fragments as 
SMARTS patterns then used the patterns against themselves to generate a 
hierarchical tree.

This would correspond roughly to checking if one regular expression is a subset 
of another, which is a very different algorithm than pattern matching a string.



                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to