Re: [Rdkit-discuss] delete a substructure

2017-03-05 Thread Greg Landrum
The solution that Hongbin proposes to the double-counting problem is a good
one. Just be sure to sort your substructure queries in the right order so
that the more complex ones come first.

Another thing you might think about is making your queries more specific.
For example, as you pointed out "[OH]" is very general and matches parts of
carboxylic acids and a number of other functional groups. The RDKit has a
set of fairly well tested (though certainly not perfect) functional group
definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol
definition from there looks like this:
[O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])]


-greg


On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾  wrote:

> Hi, Chenyang,
> You don't need to delete the substructure from the molecule. Just
> check whehter the mapped atoms have been matched. For example:
>
> m = Chem.MolFromSmiles('CC(=O)O')
> OH = Chem.MolFromSmarts('[OH]')
> COOH = Chem.MolFromSmarts('C(O)=O')
>
> m.GetSubstructMatches(OH)
> >> ((3,),)
> m.GetSubstructMatchs(COOH)
> >> ((1, 3, 2),)
>
> Since atom "3" has been already matched, it should be ignored.
> So you can create a "set" to record the matched atoms to avoid repetitive
> count.
>
> --
> Hongbin Yang 杨弘宾
>
>
> *From:* Chenyang Shi 
> *Date:* 2017-03-06 14:04
> *To:* Greg Landrum 
> *CC:* RDKit Discuss 
> *Subject:* Re: [Rdkit-discuss] delete a substructure
> Hi Greg,
>
> Thanks for a prompt reply. I did try "GetSubstructMatches()" and it
> returns correct numbers of substructures for CH3COOH. The potential problem
> with this approach is that if the molecule is getting complicated, it will
> possibly generate duplicate numbers for certain functional groups. For
> example, --OH (alcohol) group will be likely also counted in --COOH. A
> safer way, in my mind, is to remove the substructure that has been counted.
>
> Greg, you mentioned "chemical reaction functionality", can you show me a
> demo script with that using CH3COOH as an example. I will definitely delve
> into the manual to learn more. But reading your code will be a good start.
>
> Thanks,
> Chenyang
>
>
>
> On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum 
> wrote:
>
>> Hi Chenyang,
>>
>> If you're really interested in counting the number of times the
>> substructure appears, you can do that much quicker with
>> `GetSubstructMatches()`:
>>
>> In [2]: m = Chem.MolFromSmiles('CC(C)CCO')
>> In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
>> Out[3]: 2
>>
>> Is that sufficient, or do you actually want to sequentially remove all of
>> the groups in your list?
>>
>> If you actually want to remove them, you are probably better off using
>> the chemical reaction functionality instead of DeleteSubstructs(), which
>> recalculates the number of implicit Hs on atoms after each call.
>>
>> -greg
>>
>>
>> On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi  wrote:
>>
>>> I am new to rdkit but I am already impressed by its vibrant community. I
>>> have a question regarding deleting substructure. In the RDKIT
>>> documentation, this is a snippet of code describing how to delete
>>> substructure:
>>>
>>> >>>m = Chem.MolFromSmiles("CC(=O)O")
>>> >>>patt = Chem.MolFromSmarts("C(=O)[OH]")
>>> >>>rm = AllChem.DeleteSubstructs(m, patt)
>>> >>>Chem.MolToSmiles(rm)
>>> 'C'
>>>
>>> This block of code first loads a molecule CH3COOH using SMILES code,
>>> then defines a substructure COOH using SMARTS code which is to be deleted.
>>> After final line of code, the program outputs 'C', in SMILES form.
>>>
>>> I had wanted to develop a method for detecting number of groups in a
>>> molecule. In CH3COOH case, I can search number of --CH3 and --COOH group by
>>> using their respective SMARTS code with no problem. However, when molecule
>>> becomes more complicated, it is preferred to delete the substructure that
>>> has been searched before moving to next search using SMARTS code. Well, in
>>> current case, after searching -COOH group and deleting it, the leftover is
>>> 'C' which is essentially CH4 instead of --CH3. I cannot proceed with
>>> searching with SMARTS code for --CH3 ([CH3;A;X4!R]).
>>>
>>> Is there any way to work around this?
>>> Thanks,
>>> Chenyang
>>>
>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> 

Re: [Rdkit-discuss] delete a substructure

2017-03-05 Thread 杨弘宾






Hi, Chenyang,    You don't need to delete the substructure from the molecule. 
Just check whehter the mapped atoms have been matched. For example:
m = Chem.MolFromSmiles('CC(=O)O')OH = Chem.MolFromSmarts('[OH]')COOH = 
Chem.MolFromSmarts('C(O)=O')
m.GetSubstructMatches(OH)>> ((3,),)m.GetSubstructMatchs(COOH)>> ((1, 3, 2),)
Since atom "3" has been already matched, it should be ignored. So you can 
create a "set" to record the matched atoms to avoid repetitive count.


Hongbin Yang 杨弘宾


 From: Chenyang ShiDate: 2017-03-06 14:04To: Greg LandrumCC: RDKit 
DiscussSubject: Re: [Rdkit-discuss] delete a substructureHi Greg,
Thanks for a prompt reply. I did try "GetSubstructMatches()" and it returns 
correct numbers of substructures for CH3COOH. The potential problem with this 
approach is that if the molecule is getting complicated, it will possibly 
generate duplicate numbers for certain functional groups. For example, --OH 
(alcohol) group will be likely also counted in --COOH. A safer way, in my mind, 
is to remove the substructure that has been counted. 
Greg, you mentioned "chemical reaction functionality", can you show me a demo 
script with that using CH3COOH as an example. I will definitely delve into the 
manual to learn more. But reading your code will be a good start. 
Thanks,Chenyang
 
On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum  wrote:
Hi Chenyang,
If you're really interested in counting the number of times the substructure 
appears, you can do that much quicker with `GetSubstructMatches()`:
In [2]: m = Chem.MolFromSmiles('CC(C)CCO')In [3]: 
len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
Out[3]: 2
Is that sufficient, or do you actually want to sequentially remove all of the 
groups in your list?
If you actually want to remove them, you are probably better off using the 
chemical reaction functionality instead of DeleteSubstructs(), which 
recalculates the number of implicit Hs on atoms after each call.
-greg

On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi  wrote:
I am new to rdkit but I am already impressed by its vibrant community. I have a 
question regarding deleting substructure. In the RDKIT documentation, this is a 
snippet of code describing how to delete substructure:
>>>m = Chem.MolFromSmiles("CC(=O)O")>>>patt = 
>>>Chem.MolFromSmarts("C(=O)[OH]")>>>rm = AllChem.DeleteSubstructs(m, 
>>>patt)>>>Chem.MolToSmiles(rm)'C'
This block of code first loads a molecule CH3COOH using SMILES code, then 
defines a substructure COOH using SMARTS code which is to be deleted. After 
final line of code, the program outputs 'C', in SMILES form. 
I had wanted to develop a method for detecting number of groups in a molecule. 
In CH3COOH case, I can search number of --CH3 and --COOH group by using their 
respective SMARTS code with no problem. However, when molecule becomes more 
complicated, it is preferred to delete the substructure that has been searched 
before moving to next search using SMARTS code. Well, in current case, after 
searching -COOH group and deleting it, the leftover is 'C' which is essentially 
CH4 instead of --CH3. I cannot proceed with searching with SMARTS code for 
--CH3 ([CH3;A;X4!R]). 
Is there any way to work around this?Thanks,Chenyang 
 

--

Check out the vibrant tech community on one of the world's most

engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___

Rdkit-discuss mailing list

Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss







--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] delete a substructure

2017-03-05 Thread Chenyang Shi
Hi Greg,

Thanks for a prompt reply. I did try "GetSubstructMatches()" and it returns
correct numbers of substructures for CH3COOH. The potential problem with
this approach is that if the molecule is getting complicated, it will
possibly generate duplicate numbers for certain functional groups. For
example, --OH (alcohol) group will be likely also counted in --COOH. A
safer way, in my mind, is to remove the substructure that has been counted.

Greg, you mentioned "chemical reaction functionality", can you show me a
demo script with that using CH3COOH as an example. I will definitely delve
into the manual to learn more. But reading your code will be a good start.

Thanks,
Chenyang



On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum 
wrote:

> Hi Chenyang,
>
> If you're really interested in counting the number of times the
> substructure appears, you can do that much quicker with
> `GetSubstructMatches()`:
>
> In [2]: m = Chem.MolFromSmiles('CC(C)CCO')
> In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
> Out[3]: 2
>
> Is that sufficient, or do you actually want to sequentially remove all of
> the groups in your list?
>
> If you actually want to remove them, you are probably better off using the
> chemical reaction functionality instead of DeleteSubstructs(), which
> recalculates the number of implicit Hs on atoms after each call.
>
> -greg
>
>
> On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi  wrote:
>
>> I am new to rdkit but I am already impressed by its vibrant community. I
>> have a question regarding deleting substructure. In the RDKIT
>> documentation, this is a snippet of code describing how to delete
>> substructure:
>>
>> >>>m = Chem.MolFromSmiles("CC(=O)O")
>> >>>patt = Chem.MolFromSmarts("C(=O)[OH]")
>> >>>rm = AllChem.DeleteSubstructs(m, patt)
>> >>>Chem.MolToSmiles(rm)
>> 'C'
>>
>> This block of code first loads a molecule CH3COOH using SMILES code, then
>> defines a substructure COOH using SMARTS code which is to be deleted. After
>> final line of code, the program outputs 'C', in SMILES form.
>>
>> I had wanted to develop a method for detecting number of groups in a
>> molecule. In CH3COOH case, I can search number of --CH3 and --COOH group by
>> using their respective SMARTS code with no problem. However, when molecule
>> becomes more complicated, it is preferred to delete the substructure that
>> has been searched before moving to next search using SMARTS code. Well, in
>> current case, after searching -COOH group and deleting it, the leftover is
>> 'C' which is essentially CH4 instead of --CH3. I cannot proceed with
>> searching with SMARTS code for --CH3 ([CH3;A;X4!R]).
>>
>> Is there any way to work around this?
>> Thanks,
>> Chenyang
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] delete a substructure

2017-03-05 Thread Greg Landrum
Hi Chenyang,

If you're really interested in counting the number of times the
substructure appears, you can do that much quicker with
`GetSubstructMatches()`:

In [2]: m = Chem.MolFromSmiles('CC(C)CCO')
In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
Out[3]: 2

Is that sufficient, or do you actually want to sequentially remove all of
the groups in your list?

If you actually want to remove them, you are probably better off using the
chemical reaction functionality instead of DeleteSubstructs(), which
recalculates the number of implicit Hs on atoms after each call.

-greg


On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi  wrote:

> I am new to rdkit but I am already impressed by its vibrant community. I
> have a question regarding deleting substructure. In the RDKIT
> documentation, this is a snippet of code describing how to delete
> substructure:
>
> >>>m = Chem.MolFromSmiles("CC(=O)O")
> >>>patt = Chem.MolFromSmarts("C(=O)[OH]")
> >>>rm = AllChem.DeleteSubstructs(m, patt)
> >>>Chem.MolToSmiles(rm)
> 'C'
>
> This block of code first loads a molecule CH3COOH using SMILES code, then
> defines a substructure COOH using SMARTS code which is to be deleted. After
> final line of code, the program outputs 'C', in SMILES form.
>
> I had wanted to develop a method for detecting number of groups in a
> molecule. In CH3COOH case, I can search number of --CH3 and --COOH group by
> using their respective SMARTS code with no problem. However, when molecule
> becomes more complicated, it is preferred to delete the substructure that
> has been searched before moving to next search using SMARTS code. Well, in
> current case, after searching -COOH group and deleting it, the leftover is
> 'C' which is essentially CH4 instead of --CH3. I cannot proceed with
> searching with SMARTS code for --CH3 ([CH3;A;X4!R]).
>
> Is there any way to work around this?
> Thanks,
> Chenyang
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] delete a substructure

2017-03-05 Thread Chenyang Shi
Hi everyone,

I am new to rdkit but I am already impressed by its vibrant community. I
have a question regarding deleting substructure. In the RDKIT
documentation, this is a snippet of code describing how to delete
substructure:

>>>m = Chem.MolFromSmiles("CC(=O)O")
>>>patt = Chem.MolFromSmarts("C(=O)[OH]")
>>>rm = AllChem.DeleteSubstructs(m, patt)
>>>Chem.MolToSmiles(rm)
'C'

This block of code first loads a molecule CH3COOH using SMILES code, then
defines a substructure COOH using SMARTS code which is to be deleted. After
final line of code, the program outputs 'C', in SMILES form.

I had wanted to develop a method for detecting number of groups in a
molecule. In CH3COOH case, I can search number of --CH3 and --COOH group by
using their respective SMARTS code with no problem. However, when molecule
becomes more complicated, it is preferred to delete the substructure that
has been searched before moving to next search using SMARTS code. Well, in
current case, after searching -COOH group and deleting it, the leftover is
'C' which is essentially CH4 instead of --CH3. I cannot proceed with
searching with SMARTS code for --CH3 ([CH3;A;X4!R]).

Is there any way to work around this?
Thanks,
Chenyang
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss