Hi Zhenting,

A few comments:
1) You *really* don't want to be using that workflow with the RDKit. I
curated a version of the PAINS SMARTS that do not require Hs to be added
and that are compatible with the RDKit's aromaticity definition some years
ago: http://rdkit.blogspot.com/2015/08/curating-pains-filters.html
2) Those definitions are part of the RDKit distribution:
https://github.com/rdkit/rdkit/tree/master/Data/Pains
3) They are super easy to use in in the RDKit's FilterCatalog from Python
or inside of KNIME (
https://hub.knime.com/greglandrum/spaces/Public/latest/RDKit-Examples/Pains%20Filters
)

-greg

On Sat, Apr 25, 2020 at 2:14 PM Zhenting Gao <183310...@qq.com> wrote:

> Hi Greg and Andreas,
>
> Thanks for your help to look into my question.
> However, I made a wrong judgement about the mismatch between the SMARTS
> and the SMILES structure.
> They actually match well!!! Sorry for my wrong question
>
> The problem roots from my test of the PAINS KNIME workflow(
> http://www.myexperiment.org/workflows/4748.html). I want to have a Python
> script that do the same as the workflow.
> After the KNIME test, I tried to implemented the PAINS filter with RDkit
> in Python.
> I exported the PAINS SMARTS patterns and query SMILES to csv files, and
> wrote a short Python script.
> However, the script only identified 350 SMILES that match PAINS, which is
> very different from the 753(it is 824 when I run with Knime 4.1.2) reported
> by the KNIME workflow.
>
> After discussion with my colleague, we found that the reason for the
> difference is related to whether hydrogen addition to the query molecule is
> executed. This is actually implemented by Greg in his test script(
> https://github.com/rdkit/rdkit/blob/master/Data/Pains/test_data/run_tests.py
> )
>
> In short, we have a Python script now that can do the same thing as KNIME
> workflow. And we'd like to share with the community.
> https://github.com/zhentg/GShare/blob/master/CADD/PAINS_filter.py
>
> Python=3.7.6
> RDkit=2019.09.3
>
> Best regards
> Zhenting
> 4/25/2020
>
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Greg Landrum"<greg.land...@gmail.com>;
> *发送时间:* 2020年4月24日(星期五) 下午2:29
> *收件人:* "Zhenting Gao"<183310...@qq.com>;
> *抄送:* "Rdkit-discuss"<rdkit-discuss@lists.sourceforge.net>;
> *主题:* Re: [Rdkit-discuss] Why this substructure query by SMARTS failed
>
> Hi Zhenting,
>
> This work fine for me with both the 2020.03 release:
>
> In [6]: print(rdkit.__version__)
>
> 2020.03.1
>
> In [7]: from rdkit import Chem
>
>
> In [8]: p =
> Chem.MolFromSmarts('c1:c:c(:c:c:c:1-[#8]-[#6&X4])-[#7;$([#7&!H0]-[#6&X4]),$([#7](-[#6&X4])-[#6&X4])]')
>
>
> In [9]: m = Chem.MolFromSmiles('N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C')
>
>
> In [10]: m.HasSubstructMatch(p)
>
> Out[10]: True
>
>
> and the 2019.09 release:
>
> In [1]: import rdkit
>
>
> In [2]: print(rdkit.__version__)
>
> 2019.09.3
>
> In [3]: from rdkit import Chem
>
>
> In [4]: m = Chem.MolFromSmiles('N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C')
>
>
> In [5]: p =
> Chem.MolFromSmarts('c1:c:c(:c:c:c:1-[#8]-[#6&X4])-[#7;$([#7&!H0]-[#6&X4]),$([#7](-[#6&X4])-[#6&X4])]')
>
>
>
> In [6]: m.HasSubstructMatch(p)
>
> Out[6]: True
>
> Can you please share a code snippet that shows the problem?
>
> -greg
>
>
>
>
> On Thu, Apr 23, 2020 at 7:26 PM Zhenting Gao <183310...@qq.com> wrote:
>
>> Hi there,
>>
>> I'm trying to filter a compound list by PAINS filter.
>> With SMARTS query
>>
>> 'c1:c:c(:c:c:c:1-[#8]-[#6&X4])-[#7;$([#7&!H0]-[#6&X4]),$([#7](-[#6&X4])-[#6&X4])]'
>>
>> KNIME can identify the following SMILES as a match
>> 'N2C(C1(CCCC1)Cc3c2cc(c(c3)OC)OC)CC=C'
>>
>> But I can't identify the same SMILES with RDkit 2019.9.3.
>> I guess the difference is aromaticity mismatch. Could you help?
>>
>> Best regards
>> Zhenting
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to