[Rdkit-discuss] SMILES/SMARTS codes that match multiple atoms

2020-02-08 Thread Janusz Petkowski
Dear RDkit community,


I would appreciate your insight into the following simple problem:


[H]C(=O)OC([C,H])([H])[H]  or
[H]C(=O)OC([#6,H])([H])[H]

[note that this notation uses [C, H] which implies that in a given position 
there can be C or H. The situation is similar in [#6,H]]

Both of them therefore should match
C(=O)OC
C(=O)OCC
C(=O)OCCC

whereas

[H]C(=O)OC([H])([H])[H]

should only match the first

C(=O)OC

while

[H]C(=O)OC([#6])([H])[H]

should only match the second and third

C(=O)OCC
C(=O)OCCC

In reality it matches only the last two
C(=O)OCC
C(=O)OCCC
it does not match the first one:
C(=O)OC .

I of course add explicit hydrogens to the target molecules, e.g. C(=O)OC?.   It 
looks like the [C, H]  notation which implies that in a given position there 
can be C or H is not recognized (it does not match the H in  the [C,H])? If not 
how can I match cases where in a given position there can be C or H with rdkit?


Thank you very much for your help.


Best regards,


Dr Janusz Petkowski

Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow>

Tel:  +1 (617) 258 - 6910
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Returning Z-matrix coordinates for a molecule in rdkit?

2017-09-19 Thread Janusz Petkowski
Dear RDKit Community,

I have a quick question. Is it possible to return a Z-matrix instead of the 
usual, Cartesian coordinates for a molecule in RDKit or do you know of any way 
of converting or generating Z-matrix coordinates for a batch of molecules?

Thanks!


Dr Janusz Petkowski

Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow>

Tel:  +1 (617) 258 - 6910<tel:%28857%29%20777-6977>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] setting valence of choice to S and P atoms in rdkit

2017-06-21 Thread Janusz Petkowski
Hi Ling (and Paolo earlier),

Thank you very much for your answers, both work very well.

All the best and happy coding!


Dr Janusz Petkowski

Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow>

Tel:  +1 (617) 258 - 6910<tel:%28857%29%20777-6977>


From: Ling Chan [lingtrek...@gmail.com]
Sent: Tuesday, June 20, 2017 9:51 PM
To: Janusz Petkowski
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] setting valence of choice to S and P atoms in rdkit

Hello Janusz,

Perhaps you have answered your own question? You can start with Smiles like 
"[H][SH3](C)[SH5]".

Otherwise you could use the SetNumExplicitHs() function. For example,

  m = Chem.MolFromSmiles('CS')
  m.GetAtomWithIdx(1).SetNumExplicitHs(5)
  AllChem.SanitizeMol(m)
  print Chem.AddHs(m).GetNumAtoms()

will inform you that there is a total of 10 atoms. But if you comment out the 
line with SetNumExplicitHs, it will inform you that the total number of atoms 
is 6.

The above seems to work without the SanitizeMol() function but I think it is 
better to call it for safety, to clean up the molecule.

Ling Chan




On Tue, Jun 20, 2017 at 7:37 AM, Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote:
Dear RDKit Community,

I have a quick question regarding a possibility of setting valence of an atom 
in rdkit.

Let's say that I have a molecule like this (smiles notation): PPC or SSC and I 
would like to change the valence of one or more S or P atoms from default II 
for S or III for P to let's say SIV or SVI and PV. As a result I would like to 
have the following molecules (as an example): [H][SH3](C)[SH5], [H][SH2]SC, 
[H][SH3](C)[SH3] or [H][PH3]PC, [H][PH3][PH3]C

Is it possible to output such molecules using SSC or PPC molecules as inputs, 
using one of rdkit methods (modules)?

Thank you very much for your help,

Best regards,

Dr Janusz Petkowski

Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow>

Tel:  +1 (617) 258 - 6910<tel:%28857%29%20777-6977>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] setting valence of choice to S and P atoms in rdkit

2017-06-20 Thread Janusz Petkowski
Dear RDKit Community,

I have a quick question regarding a possibility of setting valence of an atom 
in rdkit.

Let's say that I have a molecule like this (smiles notation): PPC or SSC and I 
would like to change the valence of one or more S or P atoms from default II 
for S or III for P to let's say SIV or SVI and PV. As a result I would like to 
have the following molecules (as an example): [H][SH3](C)[SH5], [H][SH2]SC, 
[H][SH3](C)[SH3] or [H][PH3]PC, [H][PH3][PH3]C

Is it possible to output such molecules using SSC or PPC molecules as inputs, 
using one of rdkit methods (modules)?

Thank you very much for your help,

Best regards,

Dr Janusz Petkowski

Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow>

Tel:  +1 (617) 258 - 6910<tel:%28857%29%20777-6977>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms

2017-01-21 Thread Janusz Petkowski
Ok,  one last question. I try to update my RDKit to the current version 
(rdkit-Release_2016_09_3) which I downloaded from here 
https://github.com/rdkit/rdkit/releases so I can use onlyOnAtoms function.

My current version (2015.03.1.) installed on Win 7 machine works perfectly 
well.  I have downloaded the new one - rdkit-Release_2016_09_3 - I have set up 
environmental variables as described in Win installation guide  (and as I had 
to set them up last time to get the previous 2015.03.1 version working) and at 
the end I have an import error like that:

from rdkit import Chem
  File "C:\rdkit-Release_2016_09_3\rdkit\__init__.py", line 2, in 
from .rdBase import rdkitVersion as __version__
ImportError: No module named rdBase

I presume that this is somehow related to missing DLLs? But I had them 
installed when I got the old version, so they should be there. When I try to 
download them from here 
http://www.microsoft.com/en-us/download/details.aspx?id= anyway, I got a 
notification that newer DLLs are already installed.

Reverting to my previous RDkit version 2015.03.1. allows everything to work 
again.

Does anybody know how to circumvent this problem?

Thank you once again!

Janusz

From: Peter Gedeck [peter.ged...@gmail.com]
Sent: Saturday, January 21, 2017 3:44 PM
To: Janusz Petkowski; Maciek Wójcikowski
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified 
non-hydrogen atoms

Looks like you have a very old version of RDkit. The additional option was 
included in RDkit 2016.03.1. Check

import rdkit
print(rdkit.__version__)

Best,

Peter



On Sat, Jan 21, 2017 at 3:39 PM Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote:
Czesc again,

Many thanks for the code snippet. I thought that I use it wrongly, I previously 
tried to use it exactly like you wrote, but I always got an error back. I think 
that maybe I am missing a module? I copied your snippet and tried to use it and 
got the same error


m1 = Chem.MolFromSmiles('c1c1')

m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
print Chem.MolToSmiles(m1)


The error is below:

m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
Boost.Python.ArgumentError: Python argument types in
rdkit.Chem.rdmolops.AddHs(Mol)
did not match C++ signature:
AddHs(class RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False)

It looks like RDkit does not recognize the onlyOnAtoms function?

Thanks again for all your help!

Janusz


From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>]
Sent: Saturday, January 21, 2017 3:11 PM

To: Janusz Petkowski
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified 
non-hydrogen atoms
Cześć,

Following code will add Hs to atoms 2,3,4. These are the usual RDKit indices 
which you get from "Atom.GetIdx()".
In [5]: m1 = Chem.MolFromSmiles('c1c1')
   ...: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
   ...: Chem.MolToSmiles(m1)
   ...:
   ...:
Out[5]: '[H]c1([H])c1[H]'



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>

2017-01-21 15:54 GMT+01:00 Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>>:
Czesc Maciek,

Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is 
exactly what I would need. If it is not too big of a problem would it be 
possible for you to give me a simple example how to toggle that option on? I am 
sorry if this question seems obvious but I am not a programmer and my python 
skills are not yet advanced.

Best regards,

Janusz Petkowski

From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>]
Sent: Saturday, January 21, 2017 5:35 AM
To: Janusz Petkowski
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified 
non-hydrogen atoms

Hi Janusz,

AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms to 
include. 
[http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AddHs]


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>

2017-01-20 23:21 GMT+01:00 Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>>:
Dear RDKit Community,

By default H atoms are not explicit in the molecular graph and because of that 
the substructure matching is ignoring them when searching for substructures. It 
is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in 
the molecule and then perform substructure matching but is it possible, in 
RDkit, to add explicit hydrogens specifically to a

Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms

2017-01-21 Thread Janusz Petkowski
I have RDKit_2015_03_01. If I have to update it to the newest release to get 
this onlyOnAtoms function what would be the safest way of doing it.

PS. Somehow my version checking commands also do not work...

Janusz

From: Maciek Wójcikowski [mac...@wojcikowski.pl]
Sent: Saturday, January 21, 2017 3:46 PM
To: Janusz Petkowski
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified 
non-hydrogen atoms

Which RDKit version do you have?

"print rdkit.__version__"


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>

2017-01-21 21:38 GMT+01:00 Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>>:
Czesc again,

Many thanks for the code snippet. I thought that I use it wrongly, I previously 
tried to use it exactly like you wrote, but I always got an error back. I think 
that maybe I am missing a module? I copied your snippet and tried to use it and 
got the same error


m1 = Chem.MolFromSmiles('c1c1')
m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
print Chem.MolToSmiles(m1)


The error is below:

m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
Boost.Python.ArgumentError: Python argument types in
rdkit.Chem.rdmolops.AddHs(Mol)
did not match C++ signature:
AddHs(class RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False)

It looks like RDkit does not recognize the onlyOnAtoms function?

Thanks again for all your help!

Janusz


From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>]
Sent: Saturday, January 21, 2017 3:11 PM

To: Janusz Petkowski
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified 
non-hydrogen atoms

Cześć,

Following code will add Hs to atoms 2,3,4. These are the usual RDKit indices 
which you get from "Atom.GetIdx()".
In [5]: m1 = Chem.MolFromSmiles('c1c1')
   ...: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
   ...: Chem.MolToSmiles(m1)
   ...:
   ...:
Out[5]: '[H]c1([H])c1[H]'



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>

2017-01-21 15:54 GMT+01:00 Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>>:
Czesc Maciek,

Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is 
exactly what I would need. If it is not too big of a problem would it be 
possible for you to give me a simple example how to toggle that option on? I am 
sorry if this question seems obvious but I am not a programmer and my python 
skills are not yet advanced.

Best regards,

Janusz Petkowski

From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>]
Sent: Saturday, January 21, 2017 5:35 AM
To: Janusz Petkowski
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified 
non-hydrogen atoms

Hi Janusz,

AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms to 
include. 
[http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AddHs]


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>

2017-01-20 23:21 GMT+01:00 Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>>:
Dear RDKit Community,

By default H atoms are not explicit in the molecular graph and because of that 
the substructure matching is ignoring them when searching for substructures. It 
is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in 
the molecule and then perform substructure matching but is it possible, in 
RDkit, to add explicit hydrogens specifically to atoms of choice instead to all 
of them?

So let's say if I do:

m1 = Chem.MolFromSmiles('C=C')
m1_H = Chem.AddHs(m1)
print m1_H.GetNumAtoms()
print Chem.MolToSmiles(m1_H)

The result is:

>>> 6
>>> [H]C([H])=C([H])[H]

What if I would like to add only one (1)  explicit hydrogen atom to a specific 
non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I would want to 
have:

print m1_H.GetNumAtoms()
print Chem.MolToSmiles(m1_H)

>>> 3
>>> [H]C=C

I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1) 
which correctly adds an explicit H to C=C molecule but somehow I cannot convert 
it to smiles with this one additional explicit H added or subsequently use  for 
substructure matching.

At the end I would like to do a substructure matching where the following query 
structures:


[H]C=C or [H]C=CC match the following molecule: 
[H]C(=C([H])C([H])([H])[H])C([H])([H])[H]

but at the same time those query structures: [H]C=C([H])[H] or [H]C(

Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms

2017-01-21 Thread Janusz Petkowski
Czesc again,

Many thanks for the code snippet. I thought that I use it wrongly, I previously 
tried to use it exactly like you wrote, but I always got an error back. I think 
that maybe I am missing a module? I copied your snippet and tried to use it and 
got the same error


m1 = Chem.MolFromSmiles('c1c1')
m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
print Chem.MolToSmiles(m1)


The error is below:

m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
Boost.Python.ArgumentError: Python argument types in
rdkit.Chem.rdmolops.AddHs(Mol)
did not match C++ signature:
AddHs(class RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False)

It looks like RDkit does not recognize the onlyOnAtoms function?

Thanks again for all your help!

Janusz


From: Maciek Wójcikowski [mac...@wojcikowski.pl]
Sent: Saturday, January 21, 2017 3:11 PM
To: Janusz Petkowski
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified 
non-hydrogen atoms

Cześć,

Following code will add Hs to atoms 2,3,4. These are the usual RDKit indices 
which you get from "Atom.GetIdx()".
In [5]: m1 = Chem.MolFromSmiles('c1c1')
   ...: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4))
   ...: Chem.MolToSmiles(m1)
   ...:
   ...:
Out[5]: '[H]c1([H])c1[H]'



Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>

2017-01-21 15:54 GMT+01:00 Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>>:
Czesc Maciek,

Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is 
exactly what I would need. If it is not too big of a problem would it be 
possible for you to give me a simple example how to toggle that option on? I am 
sorry if this question seems obvious but I am not a programmer and my python 
skills are not yet advanced.

Best regards,

Janusz Petkowski

From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>]
Sent: Saturday, January 21, 2017 5:35 AM
To: Janusz Petkowski
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified 
non-hydrogen atoms

Hi Janusz,

AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms to 
include. 
[http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AddHs]


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>

2017-01-20 23:21 GMT+01:00 Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>>:
Dear RDKit Community,

By default H atoms are not explicit in the molecular graph and because of that 
the substructure matching is ignoring them when searching for substructures. It 
is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in 
the molecule and then perform substructure matching but is it possible, in 
RDkit, to add explicit hydrogens specifically to atoms of choice instead to all 
of them?

So let's say if I do:

m1 = Chem.MolFromSmiles('C=C')
m1_H = Chem.AddHs(m1)
print m1_H.GetNumAtoms()
print Chem.MolToSmiles(m1_H)

The result is:

>>> 6
>>> [H]C([H])=C([H])[H]

What if I would like to add only one (1)  explicit hydrogen atom to a specific 
non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I would want to 
have:

print m1_H.GetNumAtoms()
print Chem.MolToSmiles(m1_H)

>>> 3
>>> [H]C=C

I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1) 
which correctly adds an explicit H to C=C molecule but somehow I cannot convert 
it to smiles with this one additional explicit H added or subsequently use  for 
substructure matching.

At the end I would like to do a substructure matching where the following query 
structures:


[H]C=C or [H]C=CC match the following molecule: 
[H]C(=C([H])C([H])([H])[H])C([H])([H])[H]

but at the same time those query structures: [H]C=C([H])[H] or [H]C([H])=CC do 
not match [H]C(=C([H])C([H])([H])[H])C([H])([H])[H]

PS. Of course, the structure [H]C([H])=C([H])[H] converted from C=C using 
Chem.AddHs(mol) will not be matched onto 
[H]C(=C([H])C([H])([H])[H])C([H])([H])[H] which is correct.

Thank you very much for your help,

Best regards,

Janusz Petkowski


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Check out the vibrant tech community on one of the world

Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms

2017-01-21 Thread Janusz Petkowski
Czesc Maciek,

Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is 
exactly what I would need. If it is not too big of a problem would it be 
possible for you to give me a simple example how to toggle that option on? I am 
sorry if this question seems obvious but I am not a programmer and my python 
skills are not yet advanced.

Best regards,

Janusz Petkowski

From: Maciek Wójcikowski [mac...@wojcikowski.pl]
Sent: Saturday, January 21, 2017 5:35 AM
To: Janusz Petkowski
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified 
non-hydrogen atoms

Hi Janusz,

AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms to 
include. 
[http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AddHs]


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>

2017-01-20 23:21 GMT+01:00 Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>>:
Dear RDKit Community,

By default H atoms are not explicit in the molecular graph and because of that 
the substructure matching is ignoring them when searching for substructures. It 
is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in 
the molecule and then perform substructure matching but is it possible, in 
RDkit, to add explicit hydrogens specifically to atoms of choice instead to all 
of them?

So let's say if I do:

m1 = Chem.MolFromSmiles('C=C')
m1_H = Chem.AddHs(m1)
print m1_H.GetNumAtoms()
print Chem.MolToSmiles(m1_H)

The result is:

>>> 6
>>> [H]C([H])=C([H])[H]

What if I would like to add only one (1)  explicit hydrogen atom to a specific 
non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I would want to 
have:

print m1_H.GetNumAtoms()
print Chem.MolToSmiles(m1_H)

>>> 3
>>> [H]C=C

I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1) 
which correctly adds an explicit H to C=C molecule but somehow I cannot convert 
it to smiles with this one additional explicit H added or subsequently use  for 
substructure matching.

At the end I would like to do a substructure matching where the following query 
structures:


[H]C=C or [H]C=CC match the following molecule: 
[H]C(=C([H])C([H])([H])[H])C([H])([H])[H]

but at the same time those query structures: [H]C=C([H])[H] or [H]C([H])=CC do 
not match [H]C(=C([H])C([H])([H])[H])C([H])([H])[H]

PS. Of course, the structure [H]C([H])=C([H])[H] converted from C=C using 
Chem.AddHs(mol) will not be matched onto 
[H]C(=C([H])C([H])([H])[H])C([H])([H])[H] which is correct.

Thank you very much for your help,

Best regards,

Janusz Petkowski


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms

2017-01-20 Thread Janusz Petkowski
Dear RDKit Community,

By default H atoms are not explicit in the molecular graph and because of that 
the substructure matching is ignoring them when searching for substructures. It 
is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in 
the molecule and then perform substructure matching but is it possible, in 
RDkit, to add explicit hydrogens specifically to atoms of choice instead to all 
of them?

So let's say if I do:

m1 = Chem.MolFromSmiles('C=C')
m1_H = Chem.AddHs(m1)
print m1_H.GetNumAtoms()
print Chem.MolToSmiles(m1_H)

The result is:

>>> 6
>>> [H]C([H])=C([H])[H]

What if I would like to add only one (1)  explicit hydrogen atom to a specific 
non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I would want to 
have:

print m1_H.GetNumAtoms()
print Chem.MolToSmiles(m1_H)

>>> 3
>>> [H]C=C

I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1) 
which correctly adds an explicit H to C=C molecule but somehow I cannot convert 
it to smiles with this one additional explicit H added or subsequently use  for 
substructure matching.

At the end I would like to do a substructure matching where the following query 
structures:


[H]C=C or [H]C=CC match the following molecule: 
[H]C(=C([H])C([H])([H])[H])C([H])([H])[H]

but at the same time those query structures: [H]C=C([H])[H] or [H]C([H])=CC do 
not match [H]C(=C([H])C([H])([H])[H])C([H])([H])[H]

PS. Of course, the structure [H]C([H])=C([H])[H] converted from C=C using 
Chem.AddHs(mol) will not be matched onto 
[H]C(=C([H])C([H])([H])[H])C([H])([H])[H] which is correct.

Thank you very much for your help,

Best regards,

Janusz Petkowski

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MCS module - bonding and hybridization in substructure search

2015-11-16 Thread Janusz Petkowski
Dear Greg and Peter,

Thank you very much for your feedback and I am very sorry if my examples were 
not clear enough. Please look at those below, provided in a format Greg 
requested. I hope it helps in explaining what I mean.

Thanks a lot!

Best regards,

Janusz Petkowski

As an additional requirement for the results the (ringMatchesRingOnly and 
completeRingsOnly methods are always applied in each case)

Example 1:

["CC=CNC", "C=CNC=CC"] ==> CC=CN

Example 2:

["CC(N)C(N)=O", "CC(N)C(=O)NC(C)C(=O)O"] ==> CC(N)C(N)=O
["CC(N)C(=O)O", "CC(N)C(=O)NC(C)C(=O)O"] ==> CC(N)C(=O)O

Example 3:

["C\C=C\N", "C\C=C\NC1CCC1"] ==> C/C=C/N
["CCCN", "CCCNC1CCC1"] ==> CCCN
["CCCN" ,"CCCNC1=CCC1"] ==> CCCN

Example 4:

["NC1CCC1", "C\C=C\NC1CCC1"] ==> NC1CCC1

Example 5:

["NC1=CCC1", "CCN=NC1=CCC1"] ==> C1CC=C1

Example 6:

["NC1=CCC1", "CC\C=N/C1=CCC1"]  ==>  C1CC=C1
["NC1=CCC1", "CC\C=N/C1CCC1"] ==> None

Example 7:

["CCC", "CC(C)=O"] ==> None
["CCC", "CC(C)O"] ==> CCC
["CCC", "CC(C)=N"] ==> None
["CCC", "CC(C)N"] ==> CCC
["CCC", "CCC=C=C"] ==> None
["C=C=C ", "CCC=C=C"] ==> C=C=C

Example 8:

["NC1CCC1" ," CN=C1CCC1"] ==> CCC (but if ringMatchesRingOnly and 
completeRingsOnly methods are on at the same time ==> None)




From: Peter Shenkin [shen...@gmail.com]
Sent: Sunday, November 15, 2015 2:44 PM
To: Janusz Petkowski
Cc: Greg Landrum; rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] MCS module - bonding and hybridization in 
substructure search

Say, Greg,

If you understand Janusz's request, could you perhaps explain it in other 
words? I don't quite follow it, despite having read the two emails.

I'm getting the sense that he wants to make sure that SP2 nitrogens match only 
SP2 nitrogens (for example). Is this right? I know OpenEye has an extension to 
specify hybridization, but don't know whether RDKit has implemented something 
like that; if not, a recursive SMARTS ought to be able to do it.

On Sun, Nov 15, 2015 at 10:55 AM, Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote:
Dear Greg,

Thank you very much for your reply. I will try to explain more what I would 
like to achieve, I hope that it will clarify things a little.

Let's look at your example firs and let's treat the first molecule (CC=CNC) in 
["CC=CNC", "C=CNC=CC"] as a "query", we would like to check if it is an EXACT 
match to the second molecule ("C=CNC=CC").

Your example is a case of the "solution to the Liz Wylie problem" at its best.

["CC=CNC", "C=CNC=CC"] ==> CC=CN - so 'no' - no exact match! And it is what we 
would expect upon the implementation of the current "solution to the Liz Wylie 
problem" and this is what I would consider "CORRECT" for my purposes.
Tables below are as follows:
>>> bond_type, bond_start_atom, bond_start_atom_symbol, bond_start_atom_hyb, 
>>> bond_end_atom, bond_end_atom_symbol, bond_end_atom_hyb

CC=CNC
SINGLE 0 C SP3 1 C SP2
DOUBLE 1 C SP2 2 C SP2
SINGLE 2 C SP2 3 N SP2
SINGLE 3 N SP2 4 C SP3

C=CNC=CC

DOUBLE 0 C SP2 1 C SP2
SINGLE 1 C SP2 2 N SP2
SINGLE 2 N SP2 3 C SP2
DOUBLE 3 C SP2 4 C SP2
SINGLE 4 C SP2 5 C SP3

In your example the hybridizations of C atoms in the CNC fragment of both 
molecules do not match and the overall result is ok. In the first "query" 
molecule the hybridization of the first C in the CNC fragment is sp2 (and it is 
connected to the first C in the "query" molecule via the DOUBLE bond), then the 
N is sp2, but the last C is sp3 and is bonded only via SINGLE bonds. In the 
second molecule (C=CNC=CC) both carbons in CNC fragment are sp2 AND both 
carbons are bonded via DOUBLE bonds, not like in the "query" molecule DOUBLE 
and SINGLE.
What I would like to do is to check if one structure is an exact match within 
the other, so the atoms must match, the bonds must match and the hybridization 
of an atom must match, but the bonding is the most important thing and that is 
where the exceptions show, because you can have an sp2 atom that is bonded via 
a SINGLE bond. Let me illustrate on couple of examples what I mean.

Examples to illustrate it:

Example 1, Ala-Ala dipeptide case:

CC(N)C(=O)NC(C)C(=O)O

SINGLE 0 C SP3 1 C SP3
SINGLE 1 C SP3 2 N SP3
SINGLE 1 C SP3 3 C SP2
DOUBLE 3 C SP2 4 O SP2
SINGLE 3 C SP2 5 N SP2
SINGLE 5 N SP2 6 C SP3
SINGLE 6 C SP3 7 C SP3
SINGLE 6 C SP3 8 C SP2
SINGLE 8 C SP2 9 O SP2
DOUBLE 8 C SP2 10 O SP2

if I have two "query" molecules:

1) CC(N

Re: [Rdkit-discuss] MCS module - bonding and hybridization in substructure search

2015-11-15 Thread Janusz Petkowski
mportance 
then the hybridization match.

Example 3:

The last example is an illustration of a hierarchical importance of matching I 
need. It is an example when everything matches but the result is "INCORRECT".

CC\N=N\C1=CCC1
CCN=NC1=CCC1
SINGLE 0 C SP3 1 C SP3
SINGLE 1 C SP3 2 N SP2
DOUBLE 2 N SP2 3 N SP2
SINGLE 3 N SP2 4 C SP2
DOUBLE 4 C SP2 5 C SP2
SINGLE 5 C SP2 6 C SP3
SINGLE 6 C SP3 7 C SP3
SINGLE 7 C SP3 4 C SP2

One "query" molecule:

1) NC1=CCC1

NC1=CCC1

SINGLE 0 N SP2 1 C SP2
DOUBLE 1 C SP2 2 C SP2
SINGLE 2 C SP2 3 C SP3
SINGLE 3 C SP3 4 C SP3
SINGLE 4 C SP3 1 C SP2

["NC1=CCC1", "CCN=NC1=CCC1"] ==> NC1=CCC1 - so 'yes' - exact match! But it is 
"INCORRECT".

Why? Even if the hybridizations of N atoms in the "query" and in the 
CCN=NC1=CCC1 is sp2, both N atoms in the CCN=NC1=CCC1 molecule are DOUBLE 
bonded and the N atom in the "query" molecule is SINGLE bonded, so the bonding 
does not match and as I mentioned earlier the bonding has higher order of 
importance than the hybridization.

I hope that that this clarifies what I would like to achieve, I know that it is 
probably highly non-standard problem and an unique one, but I would really 
appreciate your help with that matter! Of course the examples I gave are purely 
for computational purposes and they do not reflect the chemical stability of 
those molecules.
Thanks a lot once again!
Have a great Sunday!
Janusz Petkowski


From: Greg Landrum [greg.land...@gmail.com]
Sent: Saturday, November 14, 2015 11:26 PM
To: Janusz Petkowski
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] MCS module - bonding and hybridization in 
substructure search

Hi Janusz,

I'm not 100% sure what you're looking for, but I think it has something to do 
with including information about bond conjugation in the MCS procedure.

To confirm, can you please give a couple of examples of what you would like to 
have as output from the algorithm? Something like this with the input molecules 
on the left and the desired result on the right would help :
['CNC=CC', 'C=CNC=CC'] -> 'CNC=CC'
(I realize that specific example is not what you're looking for, it's just 
intended to be an example)

Once I've seen that I can try to figure out if it is currently doable and, if 
not, if it's possible to modify the code to support it.

Best,
-greg




On Fri, Nov 13, 2015 at 9:17 PM, Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote:
Dear RDKit Community,

I am looking for a way to use MCS module in RDKit to compare atoms and bonding 
of two molecules which will also take under consideration the hybridization of 
an atom.
The solution to similar problem was suggested before, (Inspired by this 
RDKit-discuss thread started by Liz Wylie: 
http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03676.html 
and see here http://sourceforge.net/p/rdkit/mailman/message/31830412/ )

but even if it is computationally correct it does not necessarily mirror some 
nuances of chemistry and one may want to modify it in certain specific cases.
While it works most of the time for cases like those proposed in the solution 
of Liz Wylie case:

smis = ['CC(C)=C','CC(C)C']
 or

smis2 = ['CC(C)=C','CC(C)=N']
 If we check if 'CCC' substructure is present in molecules from those two data 
sets upon implementation of Greg Landrum solution to CCC will be found only in  
'CC(C)C', taking in to the account the atoms, the bonding and the hybridization 
of the atoms. It is all correct and cool!

But let's look at the other example:
Let's look for the N\CC\N substructure in 'C\C=C\NCCN\C=C\C' or the 'NCN' 
substructure in NCN-C=C or ' C=CNCNC=C'. It will not be found there even if 
"structurally speaking" it is there.
The problem is as follows:  an electronegative atom next to a C=C bond will 
pull electron density from that bond and so the N-C bond in NCN-C=C will have a 
‘bit of’ double bond character, even if technically it is a single bond. The 
current solution to the Liz Wylie problem does not ignore that and 
distinguishes between regular N-C bond and an N-C bond next to C=C bond (like 
in NCN-C=C, because of that it will not find NCN in this structure). NCS in 
NCSC=C is matched because the S bond is more electropositive than N or O and so 
does not have that double-bond character. My question to the RDKit community 
is: How to modify Greg Landrum solution to Liz Wylie case to successfully match 
such cases I mentioned above, while still retaining the hybridization check (we 
do want to have hybridization match, we just want the bonding to be more 
important). The problem is that the atoms that are not matched like the N atoms 
above have sp2 hybridization but technically are bonded by single bonds from 
all sides.
Thanks a lot for your help, time and consideration. This is my first post on 
RDKit forum, I am new

Re: [Rdkit-discuss] defining the size of the ring in the ringMatchesRingOnly and completeRingsOnly methods - the macrocycles case

2015-11-15 Thread Janusz Petkowski
Dear Greg,

Thank you very much for addressing my macrocycles question.

If this is not to much trouble for you could you give me a short guide how 
should I proceed with editing RingInfo data structure so it "forgets" that 
rings above a certain size exist?

I am sorry to burden you with this but I only started learning programming 
around two months ago and my python programming skills are still quite limited.

Thanks a lot for all your help!

Janusz Petkowski

From: Greg Landrum [greg.land...@gmail.com]
Sent: Saturday, November 14, 2015 11:37 PM
To: Janusz Petkowski
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] defining the size of the ring in the 
ringMatchesRingOnly and completeRingsOnly methods - the macrocycles case

Dear Janusz,

This isn't currently possible.
The most straightforward way I could think to implement it (maybe someone else 
has a better idea?) would be to allow the molecule's RingInfo data structure to 
be edited so that you could, for example, tell it to "forget" that rings above 
a certain size exist.

This would be relatively straightforward to do and I could imagine that 
functionality being useful in other places as well.

-greg


On Sun, Nov 15, 2015 at 12:08 AM, Janusz Petkowski 
<jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote:
One other question about MCS, in addition to my previous one on hybridization:

In the RDKit documentation in the Maximum Common Substructure (MCS) section it 
is mentioned that one can restrict mapping linear fragments on to rings using 
two methods: ringMatchesRingOnly and completeRingsOnly.

It is an extremely useful method but is there a possibility to restrict 
execution of this method by defining the size of the rings for which ring bonds 
will match only ring bonds in a given molecule? But at the same time, for rings 
of a certain size (let's say for rings that have below 8 atoms) the function is 
still executed? I am trying to avoid the problem of not finding linear 
fragments in the macrocycles structures. I want linear fragments to be matched 
in macrocycles but not in rings of smaller ("regular") size, all within the 
same molecule of course. Is there a way to do that?

Just as an illustration of the problem:

Is it possible to find ClCC=C fragment in CC(F)C1CC\C=C(C)\C(Cl)C2C=CC(
Br)CC2[C@H](C)[C@]2(O)OC3=C(C)C(=O)C(O)=C([C@@H]4O[C@H]1[C@H](C)[C@H](O)[C@H]4C)C3=C2
but at the same time avoid finding CC(Br)C=C in it, by using the 
ringMatchesRingOnly and completeRingsOnly methods?
With current implementation of ringMatchesRingOnly and completeRingsOnly 
methods they treat all the rings the same way, no matter the size.
If not, how would one do that?

Thanks a lot for your help!

Have a great weekend!

Best regards,

Janusz Petkowski

--

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] defining the size of the ring in the ringMatchesRingOnly and completeRingsOnly methods - the macrocycles case

2015-11-14 Thread Janusz Petkowski
One other question about MCS, in addition to my previous one on hybridization:

In the RDKit documentation in the Maximum Common Substructure (MCS) section it 
is mentioned that one can restrict mapping linear fragments on to rings using 
two methods: ringMatchesRingOnly and completeRingsOnly.

It is an extremely useful method but is there a possibility to restrict 
execution of this method by defining the size of the rings for which ring bonds 
will match only ring bonds in a given molecule? But at the same time, for rings 
of a certain size (let's say for rings that have below 8 atoms) the function is 
still executed? I am trying to avoid the problem of not finding linear 
fragments in the macrocycles structures. I want linear fragments to be matched 
in macrocycles but not in rings of smaller ("regular") size, all within the 
same molecule of course. Is there a way to do that?

Just as an illustration of the problem:

Is it possible to find ClCC=C fragment in CC(F)C1CC\C=C(C)\C(Cl)C2C=CC(
Br)CC2[C@H](C)[C@]2(O)OC3=C(C)C(=O)C(O)=C([C@@H]4O[C@H]1[C@H](C)[C@H](O)[C@H]4C)C3=C2
but at the same time avoid finding CC(Br)C=C in it, by using the 
ringMatchesRingOnly and completeRingsOnly methods?
With current implementation of ringMatchesRingOnly and completeRingsOnly 
methods they treat all the rings the same way, no matter the size.
If not, how would one do that?

Thanks a lot for your help!

Have a great weekend!

Best regards,

Janusz Petkowski
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MCS module - bonding and hybridization in substructure search

2015-11-13 Thread Janusz Petkowski
Dear RDKit Community,

I am looking for a way to use MCS module in RDKit to compare atoms and bonding 
of two molecules which will also take under consideration the hybridization of 
an atom.
The solution to similar problem was suggested before, (Inspired by this 
RDKit-discuss thread started by Liz Wylie: 
http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03676.html 
and see here http://sourceforge.net/p/rdkit/mailman/message/31830412/ )

but even if it is computationally correct it does not necessarily mirror some 
nuances of chemistry and one may want to modify it in certain specific cases.
While it works most of the time for cases like those proposed in the solution 
of Liz Wylie case:

smis = ['CC(C)=C','CC(C)C']
 or

smis2 = ['CC(C)=C','CC(C)=N']
 If we check if 'CCC' substructure is present in molecules from those two data 
sets upon implementation of Greg Landrum solution to CCC will be found only in  
'CC(C)C', taking in to the account the atoms, the bonding and the hybridization 
of the atoms. It is all correct and cool!

But let's look at the other example:
Let's look for the N\CC\N substructure in 'C\C=C\NCCN\C=C\C' or the 'NCN' 
substructure in NCN-C=C or ' C=CNCNC=C'. It will not be found there even if 
"structurally speaking" it is there.
The problem is as follows:  an electronegative atom next to a C=C bond will 
pull electron density from that bond and so the N-C bond in NCN-C=C will have a 
‘bit of’ double bond character, even if technically it is a single bond. The 
current solution to the Liz Wylie problem does not ignore that and 
distinguishes between regular N-C bond and an N-C bond next to C=C bond (like 
in NCN-C=C, because of that it will not find NCN in this structure). NCS in 
NCSC=C is matched because the S bond is more electropositive than N or O and so 
does not have that double-bond character. My question to the RDKit community 
is: How to modify Greg Landrum solution to Liz Wylie case to successfully match 
such cases I mentioned above, while still retaining the hybridization check (we 
do want to have hybridization match, we just want the bonding to be more 
important). The problem is that the atoms that are not matched like the N atoms 
above have sp2 hybridization but technically are bonded by single bonds from 
all sides.
Thanks a lot for your help, time and consideration. This is my first post on 
RDKit forum, I am new to RDKit and python in general, so I apologize if I 
anything is not clear.
I would really appreciate your help!

Best regards,

Janusz Petkowski
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss