Hi Zhenting, A few comments: 1) You *really* don't want to be using that workflow with the RDKit. I curated a version of the PAINS SMARTS that do not require Hs to be added and that are compatible with the RDKit's aromaticity definition some years ago: http://rdkit.blogspot.com/2015/08/curating-pains-filters.html 2) Those definitions are part of the RDKit distribution: https://github.com/rdkit/rdkit/tree/master/Data/Pains 3) They are super easy to use in in the RDKit's FilterCatalog from Python or inside of KNIME ( https://hub.knime.com/greglandrum/spaces/Public/latest/RDKit-Examples/Pains%20Filters )
-greg On Sat, Apr 25, 2020 at 2:14 PM Zhenting Gao <183310...@qq.com> wrote: > Hi Greg and Andreas, > > Thanks for your help to look into my question. > However, I made a wrong judgement about the mismatch between the SMARTS > and the SMILES structure. > They actually match well!!! Sorry for my wrong question > > The problem roots from my test of the PAINS KNIME workflow( > http://www.myexperiment.org/workflows/4748.html). I want to have a Python > script that do the same as the workflow. > After the KNIME test, I tried to implemented the PAINS filter with RDkit > in Python. > I exported the PAINS SMARTS patterns and query SMILES to csv files, and > wrote a short Python script. > However, the script only identified 350 SMILES that match PAINS, which is > very different from the 753(it is 824 when I run with Knime 4.1.2) reported > by the KNIME workflow. > > After discussion with my colleague, we found that the reason for the > difference is related to whether hydrogen addition to the query molecule is > executed. This is actually implemented by Greg in his test script( > https://github.com/rdkit/rdkit/blob/master/Data/Pains/test_data/run_tests.py > ) > > In short, we have a Python script now that can do the same thing as KNIME > workflow. And we'd like to share with the community. > https://github.com/zhentg/GShare/blob/master/CADD/PAINS_filter.py > > Python=3.7.6 > RDkit=2019.09.3 > > Best regards > Zhenting > 4/25/2020 > > > > > ------------------ 原始邮件 ------------------ > *发件人:* "Greg Landrum"<greg.land...@gmail.com>; > *发送时间:* 2020年4月24日(星期五) 下午2:29 > *收件人:* "Zhenting Gao"<183310...@qq.com>; > *抄送:* "Rdkit-discuss"<rdkit-discuss@lists.sourceforge.net>; > *主题:* Re: [Rdkit-discuss] Why this substructure query by SMARTS failed > > Hi Zhenting, > > This work fine for me with both the 2020.03 release: > > In [6]: print(rdkit.__version__) > > 2020.03.1 > > In [7]: from rdkit import Chem > > > In [8]: p = > Chem.MolFromSmarts('c1:c:c(:c:c:c:1-[#8]-[#6&X4])-[#7;$([#7&!H0]-[#6&X4]),$([#7](-[#6&X4])-[#6&X4])]') > > > In [9]: m = Chem.MolFromSmiles('N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C') > > > In [10]: m.HasSubstructMatch(p) > > Out[10]: True > > > and the 2019.09 release: > > In [1]: import rdkit > > > In [2]: print(rdkit.__version__) > > 2019.09.3 > > In [3]: from rdkit import Chem > > > In [4]: m = Chem.MolFromSmiles('N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C') > > > In [5]: p = > Chem.MolFromSmarts('c1:c:c(:c:c:c:1-[#8]-[#6&X4])-[#7;$([#7&!H0]-[#6&X4]),$([#7](-[#6&X4])-[#6&X4])]') > > > > In [6]: m.HasSubstructMatch(p) > > Out[6]: True > > Can you please share a code snippet that shows the problem? > > -greg > > > > > On Thu, Apr 23, 2020 at 7:26 PM Zhenting Gao <183310...@qq.com> wrote: > >> Hi there, >> >> I'm trying to filter a compound list by PAINS filter. >> With SMARTS query >> >> 'c1:c:c(:c:c:c:1-[#8]-[#6&X4])-[#7;$([#7&!H0]-[#6&X4]),$([#7](-[#6&X4])-[#6&X4])]' >> >> KNIME can identify the following SMILES as a match >> 'N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C' >> >> But I can't identify the same SMILES with RDkit 2019.9.3. >> I guess the difference is aromaticity mismatch. Could you help? >> >> Best regards >> Zhenting >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss