Dear RDKit community,
I'm getting unexpected results when combining SMARTS substructure
comparisons in SQL statements, and I'd like to ask for feedback to help me
understand what's going on.
Given an element, say Au, when I make a query like this:
SELECT cpds.cid FROM cpds WHERE (cpds.molecule @> '[Au]' ::qmol) AND NOT
(cpds.molecule @> '[C,c]~[C,c]' ::qmol) AND NOT (cpds.molecule @>
'[C!H0,c!H0]' ::qmol)
I don't expect to see any compounds with C-C or C-H bonds in the results.
Yet I get results like [(P(C5F5)3)4Au]Cl [1], or for example with Se,
[(CH3)3Se]+ [2]. Why?
It seems that usually my 'unexpected' results are matching one of the two
"AND NOT" conditions, not both (see console output below) but I haven't
checked systematically. I want the query to return only molecules for which
the last two substructure conditions are both false. Is my understanding of
SQL conjunctions mistaken?
I'm using RDKit 2016-03 and the rdkit extension on PostgreSQL 9.4. I'm
probably not using RDKit for what it was intended, but I'm certainly
grateful that it exists and is free software. I'd very much appreciate any
feedback on this question.
Best regards,
Akos
--
[1]: https://pubchem.ncbi.nlm.nih.gov/compound/11520592
[2]: https://pubchem.ncbi.nlm.nih.gov/compound/91580
Some console output regarding those compounds:
In [3]: mSe = Chem.MolFromSmiles('C[Se+](C)C')
In [4]: mAu =
Chem.MolFromSmiles('C1(=C(C(=C(C(=C1F)F)P(C2=C(C(=C(C(=C2F)F)F)F)F)C3=C(C(=C(C(=C3F)F)F)F)F
...: )F)F)F.Cl[Au]')
In [5]: mSe.HasSubstructMatch(Chem.MolFromSmarts('[C,c]~[C,c]'))
Out[5]: False
In [6]: mAu.HasSubstructMatch(Chem.MolFromSmarts('[C,c]~[C,c]'))
Out[6]: True
In [7]: mSe.HasSubstructMatch(Chem.MolFromSmarts('[C!H0,c!H0]'))
Out[7]: True
In [8]: mAu.HasSubstructMatch(Chem.MolFromSmarts('[C!H0,c!H0]'))
Out[8]: False
Akos Kokai <http://kaios.net/>
PhD candidate, Department of Environmental Science, Policy & Management
<http://ourenvironment.berkeley.edu/>
Fellow, Berkeley Center for Green Chemistry <http://bcgc.berkeley.edu/>
University of California, Berkeley
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss