[Rdkit-discuss] delete a substructure

2017-03-05 Thread Chenyang Shi
Hi everyone, I am new to rdkit but I am already impressed by its vibrant community. I have a question regarding deleting substructure. In the RDKIT documentation, this is a snippet of code describing how to delete substructure: >>>m = Chem.MolFromSmiles("CC(=O)O") >>>patt =

Re: [Rdkit-discuss] delete a substructure

2017-03-05 Thread Greg Landrum
Hi Chenyang, If you're really interested in counting the number of times the substructure appears, you can do that much quicker with `GetSubstructMatches()`: In [2]: m = Chem.MolFromSmiles('CC(C)CCO') In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]'))) Out[3]: 2 Is that

Re: [Rdkit-discuss] delete a substructure

2017-03-05 Thread Chenyang Shi
Hi Greg, Thanks for a prompt reply. I did try "GetSubstructMatches()" and it returns correct numbers of substructures for CH3COOH. The potential problem with this approach is that if the molecule is getting complicated, it will possibly generate duplicate numbers for certain functional groups.

Re: [Rdkit-discuss] delete a substructure

2017-03-05 Thread Greg Landrum
The solution that Hongbin proposes to the double-counting problem is a good one. Just be sure to sort your substructure queries in the right order so that the more complex ones come first. Another thing you might think about is making your queries more specific. For example, as you pointed out

Re: [Rdkit-discuss] delete a substructure

2017-03-05 Thread 杨弘宾
Hi, Chenyang,    You don't need to delete the substructure from the molecule. Just check whehter the mapped atoms have been matched. For example: m = Chem.MolFromSmiles('CC(=O)O')OH = Chem.MolFromSmarts('[OH]')COOH = Chem.MolFromSmarts('C(O)=O') m.GetSubstructMatches(OH)>>