The SMARTS definitions for many types of functional groups is here:

http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html

Note that some are disputed, but since this comes from the source, I tend
to go here first.

Cheers,
 Brian

On Mon, Mar 6, 2017 at 3:23 AM, Chenyang Shi <cs3...@columbia.edu> wrote:

> Hongbin and Greg,
> Thank you both for kind suggestions. I will try both approaches and report
> my progress later.
> Best,
> Chenyang
>
> On Monday, March 6, 2017, Greg Landrum <greg.land...@gmail.com> wrote:
>
>> The solution that Hongbin proposes to the double-counting problem is a
>> good one. Just be sure to sort your substructure queries in the right order
>> so that the more complex ones come first.
>>
>> Another thing you might think about is making your queries more specific.
>> For example, as you pointed out "[OH]" is very general and matches parts of
>> carboxylic acids and a number of other functional groups. The RDKit has a
>> set of fairly well tested (though certainly not perfect) functional group
>> definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol
>> definition from there looks like this:
>> [O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])]
>>
>>
>> -greg
>>
>>
>> On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾 <yanyangh...@163.com> wrote:
>>
>>> Hi, Chenyang,
>>>     You don't need to delete the substructure from the molecule. Just
>>> check whehter the mapped atoms have been matched. For example:
>>>
>>> m = Chem.MolFromSmiles('CC(=O)O')
>>> OH = Chem.MolFromSmarts('[OH]')
>>> COOH = Chem.MolFromSmarts('C(O)=O')
>>>
>>> m.GetSubstructMatches(OH)
>>> >> ((3,),)
>>> m.GetSubstructMatchs(COOH)
>>> >> ((1, 3, 2),)
>>>
>>> Since atom "3" has been already matched, it should be ignored.
>>> So you can create a "set" to record the matched atoms to avoid
>>> repetitive count.
>>>
>>> ------------------------------
>>> Hongbin Yang 杨弘宾
>>>
>>>
>>> *From:* Chenyang Shi
>>> *Date:* 2017-03-06 14:04
>>> *To:* Greg Landrum
>>> *CC:* RDKit Discuss
>>> *Subject:* Re: [Rdkit-discuss] delete a substructure
>>> Hi Greg,
>>>
>>> Thanks for a prompt reply. I did try "GetSubstructMatches()" and it
>>> returns correct numbers of substructures for CH3COOH. The potential problem
>>> with this approach is that if the molecule is getting complicated, it will
>>> possibly generate duplicate numbers for certain functional groups. For
>>> example, --OH (alcohol) group will be likely also counted in --COOH. A
>>> safer way, in my mind, is to remove the substructure that has been counted.
>>>
>>> Greg, you mentioned "chemical reaction functionality", can you show me
>>> a demo script with that using CH3COOH as an example. I will definitely
>>> delve into the manual to learn more. But reading your code will be a good
>>> start.
>>>
>>> Thanks,
>>> Chenyang
>>>
>>>
>>>
>>> On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum <greg.land...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chenyang,
>>>>
>>>> If you're really interested in counting the number of times the
>>>> substructure appears, you can do that much quicker with
>>>> `GetSubstructMatches()`:
>>>>
>>>> In [2]: m = Chem.MolFromSmiles('CC(C)CCO')
>>>> In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
>>>> Out[3]: 2
>>>>
>>>> Is that sufficient, or do you actually want to sequentially remove all
>>>> of the groups in your list?
>>>>
>>>> If you actually want to remove them, you are probably better off using
>>>> the chemical reaction functionality instead of DeleteSubstructs(), which
>>>> recalculates the number of implicit Hs on atoms after each call.
>>>>
>>>> -greg
>>>>
>>>>
>>>> On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi <cs3...@columbia.edu>
>>>> wrote:
>>>>
>>>>> I am new to rdkit but I am already impressed by its vibrant community.
>>>>> I have a question regarding deleting substructure. In the RDKIT
>>>>> documentation, this is a snippet of code describing how to delete
>>>>> substructure:
>>>>>
>>>>> >>>m = Chem.MolFromSmiles("CC(=O)O")
>>>>> >>>patt = Chem.MolFromSmarts("C(=O)[OH]")
>>>>> >>>rm = AllChem.DeleteSubstructs(m, patt)
>>>>> >>>Chem.MolToSmiles(rm)
>>>>> 'C'
>>>>>
>>>>> This block of code first loads a molecule CH3COOH using SMILES code,
>>>>> then defines a substructure COOH using SMARTS code which is to be deleted.
>>>>> After final line of code, the program outputs 'C', in SMILES form.
>>>>>
>>>>> I had wanted to develop a method for detecting number of groups in a
>>>>> molecule. In CH3COOH case, I can search number of --CH3 and --COOH group 
>>>>> by
>>>>> using their respective SMARTS code with no problem. However, when molecule
>>>>> becomes more complicated, it is preferred to delete the substructure that
>>>>> has been searched before moving to next search using SMARTS code. Well, in
>>>>> current case, after searching -COOH group and deleting it, the leftover is
>>>>> 'C' which is essentially CH4 instead of --CH3. I cannot proceed with
>>>>> searching with SMARTS code for --CH3 ([CH3;A;X4!R]).
>>>>>
>>>>> Is there any way to work around this?
>>>>> Thanks,
>>>>> Chenyang
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> Rdkit-discuss mailing list
>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>
>>>>>
>>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to