I agree with Chris' later comment that this doesn't look right.

Here's a simple test you can do in order to see if the right thing is
happening:

chembl_21=# select * from (select
'C1(=C(C(=C(C(=C1F)F)P(C2=C(C(=C(C(=C2F)F)F)F)F)C3=C(C(=C(C(=C3F)F)F)F)F)F)F)F.Cl[Au]'::mol
as mol) tmp where mol@>'[Au]'::qmol and NOT (mol@>'[C,c]~[C,c]'::qmol) and
NOT (mol @> '[C!H0,c!H0]'::qmol);

 mol

-----

(0 rows)

-greg



On Tue, Mar 21, 2017 at 6:34 AM, Akos Kokai <ako...@berkeley.edu> wrote:

> Dear RDKit community,
>
> I'm getting unexpected results when combining SMARTS substructure
> comparisons in SQL statements, and I'd like to ask for feedback to help me
> understand what's going on.
>
> Given an element, say Au, when I make a query like this:
>
> SELECT cpds.cid FROM cpds WHERE (cpds.molecule @> '[Au]' ::qmol) AND NOT
> (cpds.molecule @> '[C,c]~[C,c]' ::qmol) AND NOT (cpds.molecule @>
> '[C!H0,c!H0]' ::qmol)
>
> I don't expect to see any compounds with C-C or C-H bonds in the results.
> Yet I get results like [(P(C5F5)3)4Au]Cl [1], or for example with Se,
> [(CH3)3Se]+ [2]. Why?
>
> It seems that usually my 'unexpected' results are matching one of the two
> "AND NOT" conditions, not both (see console output below) but I haven't
> checked systematically. I want the query to return only molecules for which
> the last two substructure conditions are both false. Is my understanding of
> SQL conjunctions mistaken?
>
> I'm using RDKit 2016-03 and the rdkit extension on PostgreSQL 9.4. I'm
> probably not using RDKit for what it was intended, but I'm certainly
> grateful that it exists and is free software. I'd very much appreciate any
> feedback on this question.
>
> Best regards,
> Akos
>
> --
>
> [1]: https://pubchem.ncbi.nlm.nih.gov/compound/11520592
> [2]: https://pubchem.ncbi.nlm.nih.gov/compound/91580
>
> Some console output regarding those compounds:
>
> In [3]: mSe = Chem.MolFromSmiles('C[Se+](C)C')
>
> In [4]: mAu = Chem.MolFromSmiles('C1(=C(C(=C(C(=C1F)F)P(C2=C(C(=C(C(=C2F)
> F)F)F)F)C3=C(C(=C(C(=C3F)F)F)F)F
>    ...: )F)F)F.Cl[Au]')
>
> In [5]: mSe.HasSubstructMatch(Chem.MolFromSmarts('[C,c]~[C,c]'))
> Out[5]: False
>
> In [6]: mAu.HasSubstructMatch(Chem.MolFromSmarts('[C,c]~[C,c]'))
> Out[6]: True
>
> In [7]: mSe.HasSubstructMatch(Chem.MolFromSmarts('[C!H0,c!H0]'))
> Out[7]: True
>
> In [8]: mAu.HasSubstructMatch(Chem.MolFromSmarts('[C!H0,c!H0]'))
> Out[8]: False
>
>
> Akos Kokai <http://kaios.net/>
> PhD candidate, Department of Environmental Science, Policy & Management
> <http://ourenvironment.berkeley.edu/>
> Fellow, Berkeley Center for Green Chemistry <http://bcgc.berkeley.edu/>
> University of California, Berkeley
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to