Re: [Rdkit-discuss] delete a substructure

2017-03-08 Thread Pavel Polishchuk
You might find this link useful - 
http://www.rdkit.org/docs/GettingStartedInPython.html#chemical-transformations


However, the issue in your case is SMARTS definitions. If one SMARTS 
completely covers another one it would be difficult to understand is it 
artifact or not.I think it might be reasonable to revise SMARTS to avoid 
such overlapping or create a list of rules (maybe hierarchical) which 
will define valid and not valid overlappings.


Pavel.


On 03/08/2017 06:32 PM, Chenyang Shi wrote:

Dear Hongbin,

I tried your method on a molecule, 4-Methylsalicylic acid 
(CC1=CC(=C(C=C1)C(=O)O)O). I looped through all groups defined in 
Joback method (using SMARTS), and used m.GetSubstructMatches to print 
out all atom positions. The result is summarized in the table.


We can see there are duplicated counts--coming from COOH group. As 
suggested by Hongbin, we can remove duplicated atoms by looking at 
their positions--in this case, ((9),), ((7,8,),), ((7,),), and ((8,),) 
are subsets of ((7,8,9)) from -COOH. Indeed we can get rid of these 
duplicates. However, I also noticed that Atom (3,) from =C< (ring) 
group is also a part of -OH (phenol) ((10,3),). If we apply the same 
algorithm to remove duplicates, the =C<(ring) group will be only 
counted twice instead of three times.


Greg, you mentioned as an alternative I can delete substructure using 
chemical reaction method. It would be greatly appreciated if you could 
show me (point me to) a simple example code, perhaps on a simple 
molecule? I find myself at a loss when browsing the manual. I would 
like to try also in that direction.


Thanks,
Chenyang


Inline image 1


On Mon, Mar 6, 2017 at 1:52 AM, Greg Landrum > wrote:


The solution that Hongbin proposes to the double-counting problem
is a good one. Just be sure to sort your substructure queries in
the right order so that the more complex ones come first.

Another thing you might think about is making your queries more
specific. For example, as you pointed out "[OH]" is very general
and matches parts of carboxylic acids and a number of other
functional groups. The RDKit has a set of fairly well tested
(though certainly not perfect) functional group definitions in
$RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol
definition from there looks like this:
[O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])]


-greg


On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾 > wrote:

Hi, Chenyang,
You don't need to delete the substructure from the molecule.
Just check whehter the mapped atoms have been matched. For
example:

m = Chem.MolFromSmiles('CC(=O)O')
OH = Chem.MolFromSmarts('[OH]')
COOH = Chem.MolFromSmarts('C(O)=O')

m.GetSubstructMatches(OH)
>>((3,),)
m.GetSubstructMatchs(COOH)
>>((1, 3, 2),)

Since atom "3" has been already matched, it should be ignored.
So you can create a "set" to record the matched atoms to avoid
repetitive count.


Hongbin Yang 杨弘宾

*From:* Chenyang Shi 
*Date:* 2017-03-06 14:04
*To:* Greg Landrum 
*CC:* RDKit Discuss

*Subject:* Re: [Rdkit-discuss] delete a substructure
Hi Greg,

Thanks for a prompt reply. I did try
"GetSubstructMatches()" and it returns correct numbers of
substructures for CH3COOH. The potential problem with this
approach is that if the molecule is getting complicated,
it will possibly generate duplicate numbers for certain
functional groups. For example, --OH (alcohol) group will
be likely also counted in --COOH. A safer way, in my mind,
is to remove the substructure that has been counted.

Greg, you mentioned "chemical reaction functionality", can
you show me a demo script with that using CH3COOH as an
example. I will definitely delve into the manual to learn
more. But reading your code will be a good start.

Thanks,
Chenyang


On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum
>
wrote:

Hi Chenyang,

If you're really interested in counting the number of
times the substructure appears, you can do that much
quicker with `GetSubstructMatches()`:

In [2]: m = Chem.MolFromSmiles('CC(C)CCO')
In [3]:
len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
Out[3]: 2

   

Re: [Rdkit-discuss] Molecule representation

2017-03-08 Thread Dimitri Maziuk
On 03/07/2017 05:42 PM, Markus Metz wrote:
> Dear Stephane:
> Thank you very much.
> I will give it a try.

An alternative:

import os
import sys
import time
import threading
PYMOL_PATH = "/SOME/PLACE/lib64/python"
sys.path.append( PYMOL_PATH )
import pymol

def make_image( infile, outfile ) :

pymol.pymol_argv = ['pymol','-qc']
pymol.finish_launching()
cmd = pymol.cmd

cmd.load( infile )
cmd.hide( "everything" )
cmd.show( "sticks" )

cmd.util.cbaw()

cmd.set( "cartoon_discrete_colors", 1 )
cmd.set( "ray_opaque_background", "off" )
cmd.set( "ray_trace_mode",  1 )
cmd.set( "antialias", 2 )
cmd.set( "ray_trace_color", "grey" )
cmd.set( "cartoon_fancy_helices", 1 )
cmd.set( "cartoon_side_chain_helper", "on" )
cmd.png( outfile, width = 800, dpi = 300, ray = 1 )

while threading.active_count() > 2 :
time.sleep( 2 )
cmd.quit()


HTH,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] delete a substructure

2017-03-08 Thread 杨弘宾



网易邮箱






Hi Chemyang,
    Your issue was caused by the definition of "-OH(phenol)", I think.  If you 
define this pattern as "cO", the atom 3 will be matched since it is the 
aromatic carbon bond to an oxygen.  I guess you just wanted to match exactly 
the oxygen and restrict it with "bonding with an aromatic carbon". So the 
SMARTS should ber "[$(Oc)]", which indicates an oxygen with the environment of 
"bonding with an aromatic carbon".
    m = Chem.MolFromSmiles('CC1=CC(=C(C=C1)C(=O)O)O')    
m.GetSubstructMatches(Chem.MolFromSmiles('[$(Oc)]'))    >>> ((10,),)
Then only atom 10 will be matched and it won't interfere with other counts.
Reference: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html  4.4


Hongbin Yang 

 From: Chenyang ShiDate: 2017-03-09 01:32To: Greg LandrumCC: rdkit-discuss; 
杨弘宾Subject: Re: [Rdkit-discuss] delete a substructure



网易邮箱



Dear Hongbin,
I tried your method on a molecule, 4-Methylsalicylic acid 
(CC1=CC(=C(C=C1)C(=O)O)O). I looped through all groups defined in Joback method 
(using SMARTS), and used m.GetSubstructMatches to print out all atom positions. 
The result is summarized in the table. 
We can see there are duplicated counts--coming from COOH group. As suggested by 
Hongbin, we can remove duplicated atoms by looking at their positions--in this 
case, ((9),), ((7,8,),), ((7,),), and ((8,),) are subsets of ((7,8,9)) from 
-COOH. Indeed we can get rid of these duplicates. However, I also noticed that 
Atom (3,) from =C< (ring) group is also a part of -OH (phenol) ((10,3),). If we 
apply the same algorithm to remove duplicates, the =C<(ring) group will be only 
counted twice instead of three times.  
Greg, you mentioned as an alternative I can delete substructure using chemical 
reaction method. It would be greatly appreciated if you could show me (point me 
to) a simple example code, perhaps on a simple molecule? I find myself at a 
loss when browsing the manual. I would like to try also in that direction.
Thanks,Chenyang





On Mon, Mar 6, 2017 at 1:52 AM, Greg Landrum  wrote:
The solution that Hongbin proposes to the double-counting problem is a good 
one. Just be sure to sort your substructure queries in the right order so that 
the more complex ones come first.
Another thing you might think about is making your queries more specific. For 
example, as you pointed out "[OH]" is very general and matches parts of 
carboxylic acids and a number of other functional groups. The RDKit has a set 
of fairly well tested (though certainly not perfect) functional group 
definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol 
definition from there looks like this:[O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])]


-greg

On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾  wrote:

Hi, Chenyang,    You don't need to delete the substructure from the molecule. 
Just check whehter the mapped atoms have been matched. For example:
m = Chem.MolFromSmiles('CC(=O)O')OH = Chem.MolFromSmarts('[OH]')COOH = 
Chem.MolFromSmarts('C(O)=O')
m.GetSubstructMatches(OH)>> ((3,),)m.GetSubstructMatchs(COOH)>> ((1, 3, 2),)
Since atom "3" has been already matched, it should be ignored. So you can 
create a "set" to record the matched atoms to avoid repetitive count.


Hongbin Yang 杨弘宾


 From: Chenyang ShiDate: 2017-03-06 14:04To: Greg LandrumCC: RDKit 
DiscussSubject: Re: [Rdkit-discuss] delete a substructureHi Greg,
Thanks for a prompt reply. I did try "GetSubstructMatches()" and it returns 
correct numbers of substructures for CH3COOH. The potential problem with this 
approach is that if the molecule is getting complicated, it will possibly 
generate duplicate numbers for certain functional groups. For example, --OH 
(alcohol) group will be likely also counted in --COOH. A safer way, in my mind, 
is to remove the substructure that has been counted. 
Greg, you mentioned "chemical reaction functionality", can you show me a demo 
script with that using CH3COOH as an example. I will definitely delve into the 
manual to learn more. But reading your code will be a good start. 
Thanks,Chenyang
 
On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum  wrote:
Hi Chenyang,
If you're really interested in counting the number of times the substructure 
appears, you can do that much quicker with `GetSubstructMatches()`:
In [2]: m = Chem.MolFromSmiles('CC(C)CCO')In [3]: 
len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
Out[3]: 2
Is that sufficient, or do you actually want to sequentially remove all of the 
groups in your list?
If you actually want to remove them, you are probably better off using the 
chemical reaction functionality instead of DeleteSubstructs(), which 
recalculates the number of implicit Hs on atoms after each call.
-greg

On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi  wrote:
I am new to rdkit but I am already impressed by its vibrant community. I have a 
question regarding deleting substructure. In the