On Apr 19, 2017, at 23:59, Peter S. Shenkin <shen...@gmail.com> wrote: > One more thing. The term "Mol" in RDKit and some other tookits does not > really mean "molecule" in the sense that chemists use it.
? I don't see how this is connected to the previous emails. I believe most toolkits use that terminology in their APIs. (Daylight, OEChem, Open Babel, RDKit, Indigo, JChem, and InChI). I know that VMD does that too, and I believe PyMol and RasMol as well. There is a minority of software which use other terms. CACTVS calls it a 'molecular ensemble'. CDK an 'atom container' (though I see people assign it to variables with 'm' or 'mol' in it). I haven't really run into people who found this to be an issue, so I've stopped bringing it up in my documentation or when I teach. I mostly work with computational chemists, and that bias may affect things. But this current thread is a discussion between computational people, which is why I don't understand the relevancy. > The way I think of it is that SMILES is like an ordinary string and SMARTS is > like a regex that can be used to flexibly match other strings. I think this is a reasonable approximation for computer programmers. I modeled my PyDaylight wrapper on top of the Daylight toolkit using this view. Then Greg and RDKit showed me that that view was narrower than need be. In RDKit, a molecule can also be used as a subgraph. >>> from rdkit import Chem >>> mol = Chem.MolFromSmiles("c1ccccc1") >>> from rdkit import Chem >>> mol1 = Chem.MolFromSmiles("c1ccccc1") >>> mol2 = Chem.MolFromSmiles("c1ccccc1O") >>> mol2.HasSubstructMatch(mol1) True >>> mol1.HasSubstructMatch(mol2) False Stretching your analogy, this would be like a substring search rather than a regexp. It's a difficult stretch because substring search has different performance characteristics to regexp search, while subgraph search is NP-complete even when only a simple SMILES is used to define the subgraph. Alternatively, it could be like using a constrained glob pattern language instead of a more flexible regular expression. Well, except that SMILES as a pattern language has no flexibility for conjunction, disjunction, or repetition. Furthermore, in RDKit a SMARTS pattern can (to a limited extent) be used to match a SMARTS pattern: >>> pat1 = Chem.MolFromSmarts("[#7]=[#6]-[#8]") >>> pat2 = Chem.MolFromSmarts("[#7]=[#8]") >>> pat1.HasSubstructMatch(pat2) False >>> pat3 = Chem.MolFromSmarts("[#6]=[#7]") >>> pat1.HasSubstructMatch(pat3) True I've used this once in my work when I generated simple subgraph fragments as SMARTS patterns then used the patterns against themselves to generate a hierarchical tree. This would correspond roughly to checking if one regular expression is a subset of another, which is a very different algorithm than pattern matching a string. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss