[Rdkit-discuss] SMILES/SMARTS codes that match multiple atoms
Dear RDkit community, I would appreciate your insight into the following simple problem: [H]C(=O)OC([C,H])([H])[H] or [H]C(=O)OC([#6,H])([H])[H] [note that this notation uses [C, H] which implies that in a given position there can be C or H. The situation is similar in [#6,H]] Both of them therefore should match C(=O)OC C(=O)OCC C(=O)OCCC whereas [H]C(=O)OC([H])([H])[H] should only match the first C(=O)OC while [H]C(=O)OC([#6])([H])[H] should only match the second and third C(=O)OCC C(=O)OCCC In reality it matches only the last two C(=O)OCC C(=O)OCCC it does not match the first one: C(=O)OC . I of course add explicit hydrogens to the target molecules, e.g. C(=O)OC?. It looks like the [C, H] notation which implies that in a given position there can be C or H is not recognized (it does not match the H in the [C,H])? If not how can I match cases where in a given position there can be C or H with rdkit? Thank you very much for your help. Best regards, Dr Janusz Petkowski Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow> Tel: +1 (617) 258 - 6910 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Returning Z-matrix coordinates for a molecule in rdkit?
Dear RDKit Community, I have a quick question. Is it possible to return a Z-matrix instead of the usual, Cartesian coordinates for a molecule in RDKit or do you know of any way of converting or generating Z-matrix coordinates for a batch of molecules? Thanks! Dr Janusz Petkowski Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow> Tel: +1 (617) 258 - 6910<tel:%28857%29%20777-6977> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] setting valence of choice to S and P atoms in rdkit
Hi Ling (and Paolo earlier), Thank you very much for your answers, both work very well. All the best and happy coding! Dr Janusz Petkowski Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow> Tel: +1 (617) 258 - 6910<tel:%28857%29%20777-6977> From: Ling Chan [lingtrek...@gmail.com] Sent: Tuesday, June 20, 2017 9:51 PM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] setting valence of choice to S and P atoms in rdkit Hello Janusz, Perhaps you have answered your own question? You can start with Smiles like "[H][SH3](C)[SH5]". Otherwise you could use the SetNumExplicitHs() function. For example, m = Chem.MolFromSmiles('CS') m.GetAtomWithIdx(1).SetNumExplicitHs(5) AllChem.SanitizeMol(m) print Chem.AddHs(m).GetNumAtoms() will inform you that there is a total of 10 atoms. But if you comment out the line with SetNumExplicitHs, it will inform you that the total number of atoms is 6. The above seems to work without the SanitizeMol() function but I think it is better to call it for safety, to clean up the molecule. Ling Chan On Tue, Jun 20, 2017 at 7:37 AM, Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote: Dear RDKit Community, I have a quick question regarding a possibility of setting valence of an atom in rdkit. Let's say that I have a molecule like this (smiles notation): PPC or SSC and I would like to change the valence of one or more S or P atoms from default II for S or III for P to let's say SIV or SVI and PV. As a result I would like to have the following molecules (as an example): [H][SH3](C)[SH5], [H][SH2]SC, [H][SH3](C)[SH3] or [H][PH3]PC, [H][PH3][PH3]C Is it possible to output such molecules using SSC or PPC molecules as inputs, using one of rdkit methods (modules)? Thank you very much for your help, Best regards, Dr Janusz Petkowski Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow> Tel: +1 (617) 258 - 6910<tel:%28857%29%20777-6977> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] setting valence of choice to S and P atoms in rdkit
Dear RDKit Community, I have a quick question regarding a possibility of setting valence of an atom in rdkit. Let's say that I have a molecule like this (smiles notation): PPC or SSC and I would like to change the valence of one or more S or P atoms from default II for S or III for P to let's say SIV or SVI and PV. As a result I would like to have the following molecules (as an example): [H][SH3](C)[SH5], [H][SH2]SC, [H][SH3](C)[SH3] or [H][PH3]PC, [H][PH3][PH3]C Is it possible to output such molecules using SSC or PPC molecules as inputs, using one of rdkit methods (modules)? Thank you very much for your help, Best regards, Dr Janusz Petkowski Research Fellow at MIT EAPS<https://eapsweb.mit.edu/people/jjpetkow> Tel: +1 (617) 258 - 6910<tel:%28857%29%20777-6977> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms
Ok, one last question. I try to update my RDKit to the current version (rdkit-Release_2016_09_3) which I downloaded from here https://github.com/rdkit/rdkit/releases so I can use onlyOnAtoms function. My current version (2015.03.1.) installed on Win 7 machine works perfectly well. I have downloaded the new one - rdkit-Release_2016_09_3 - I have set up environmental variables as described in Win installation guide (and as I had to set them up last time to get the previous 2015.03.1 version working) and at the end I have an import error like that: from rdkit import Chem File "C:\rdkit-Release_2016_09_3\rdkit\__init__.py", line 2, in from .rdBase import rdkitVersion as __version__ ImportError: No module named rdBase I presume that this is somehow related to missing DLLs? But I had them installed when I got the old version, so they should be there. When I try to download them from here http://www.microsoft.com/en-us/download/details.aspx?id= anyway, I got a notification that newer DLLs are already installed. Reverting to my previous RDkit version 2015.03.1. allows everything to work again. Does anybody know how to circumvent this problem? Thank you once again! Janusz From: Peter Gedeck [peter.ged...@gmail.com] Sent: Saturday, January 21, 2017 3:44 PM To: Janusz Petkowski; Maciek Wójcikowski Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms Looks like you have a very old version of RDkit. The additional option was included in RDkit 2016.03.1. Check import rdkit print(rdkit.__version__) Best, Peter On Sat, Jan 21, 2017 at 3:39 PM Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote: Czesc again, Many thanks for the code snippet. I thought that I use it wrongly, I previously tried to use it exactly like you wrote, but I always got an error back. I think that maybe I am missing a module? I copied your snippet and tried to use it and got the same error m1 = Chem.MolFromSmiles('c1c1') m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4)) print Chem.MolToSmiles(m1) The error is below: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4)) Boost.Python.ArgumentError: Python argument types in rdkit.Chem.rdmolops.AddHs(Mol) did not match C++ signature: AddHs(class RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False) It looks like RDkit does not recognize the onlyOnAtoms function? Thanks again for all your help! Janusz From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>] Sent: Saturday, January 21, 2017 3:11 PM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms Cześć, Following code will add Hs to atoms 2,3,4. These are the usual RDKit indices which you get from "Atom.GetIdx()". In [5]: m1 = Chem.MolFromSmiles('c1c1') ...: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4)) ...: Chem.MolToSmiles(m1) ...: ...: Out[5]: '[H]c1([H])c1[H]' Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl> 2017-01-21 15:54 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>>: Czesc Maciek, Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is exactly what I would need. If it is not too big of a problem would it be possible for you to give me a simple example how to toggle that option on? I am sorry if this question seems obvious but I am not a programmer and my python skills are not yet advanced. Best regards, Janusz Petkowski From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>] Sent: Saturday, January 21, 2017 5:35 AM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms Hi Janusz, AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms to include. [http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AddHs] Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl> 2017-01-20 23:21 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>>: Dear RDKit Community, By default H atoms are not explicit in the molecular graph and because of that the substructure matching is ignoring them when searching for substructures. It is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in the molecule and then perform substructure matching but is it possible, in RDkit, to add explicit hydrogens specifically to a
Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms
I have RDKit_2015_03_01. If I have to update it to the newest release to get this onlyOnAtoms function what would be the safest way of doing it. PS. Somehow my version checking commands also do not work... Janusz From: Maciek Wójcikowski [mac...@wojcikowski.pl] Sent: Saturday, January 21, 2017 3:46 PM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms Which RDKit version do you have? "print rdkit.__version__" Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl> 2017-01-21 21:38 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>>: Czesc again, Many thanks for the code snippet. I thought that I use it wrongly, I previously tried to use it exactly like you wrote, but I always got an error back. I think that maybe I am missing a module? I copied your snippet and tried to use it and got the same error m1 = Chem.MolFromSmiles('c1c1') m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4)) print Chem.MolToSmiles(m1) The error is below: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4)) Boost.Python.ArgumentError: Python argument types in rdkit.Chem.rdmolops.AddHs(Mol) did not match C++ signature: AddHs(class RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False) It looks like RDkit does not recognize the onlyOnAtoms function? Thanks again for all your help! Janusz From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>] Sent: Saturday, January 21, 2017 3:11 PM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms Cześć, Following code will add Hs to atoms 2,3,4. These are the usual RDKit indices which you get from "Atom.GetIdx()". In [5]: m1 = Chem.MolFromSmiles('c1c1') ...: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4)) ...: Chem.MolToSmiles(m1) ...: ...: Out[5]: '[H]c1([H])c1[H]' Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl> 2017-01-21 15:54 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>>: Czesc Maciek, Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is exactly what I would need. If it is not too big of a problem would it be possible for you to give me a simple example how to toggle that option on? I am sorry if this question seems obvious but I am not a programmer and my python skills are not yet advanced. Best regards, Janusz Petkowski From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>] Sent: Saturday, January 21, 2017 5:35 AM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms Hi Janusz, AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms to include. [http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AddHs] Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl> 2017-01-20 23:21 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>>: Dear RDKit Community, By default H atoms are not explicit in the molecular graph and because of that the substructure matching is ignoring them when searching for substructures. It is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in the molecule and then perform substructure matching but is it possible, in RDkit, to add explicit hydrogens specifically to atoms of choice instead to all of them? So let's say if I do: m1 = Chem.MolFromSmiles('C=C') m1_H = Chem.AddHs(m1) print m1_H.GetNumAtoms() print Chem.MolToSmiles(m1_H) The result is: >>> 6 >>> [H]C([H])=C([H])[H] What if I would like to add only one (1) explicit hydrogen atom to a specific non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I would want to have: print m1_H.GetNumAtoms() print Chem.MolToSmiles(m1_H) >>> 3 >>> [H]C=C I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1) which correctly adds an explicit H to C=C molecule but somehow I cannot convert it to smiles with this one additional explicit H added or subsequently use for substructure matching. At the end I would like to do a substructure matching where the following query structures: [H]C=C or [H]C=CC match the following molecule: [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] but at the same time those query structures: [H]C=C([H])[H] or [H]C(
Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms
Czesc again, Many thanks for the code snippet. I thought that I use it wrongly, I previously tried to use it exactly like you wrote, but I always got an error back. I think that maybe I am missing a module? I copied your snippet and tried to use it and got the same error m1 = Chem.MolFromSmiles('c1c1') m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4)) print Chem.MolToSmiles(m1) The error is below: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4)) Boost.Python.ArgumentError: Python argument types in rdkit.Chem.rdmolops.AddHs(Mol) did not match C++ signature: AddHs(class RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False) It looks like RDkit does not recognize the onlyOnAtoms function? Thanks again for all your help! Janusz From: Maciek Wójcikowski [mac...@wojcikowski.pl] Sent: Saturday, January 21, 2017 3:11 PM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms Cześć, Following code will add Hs to atoms 2,3,4. These are the usual RDKit indices which you get from "Atom.GetIdx()". In [5]: m1 = Chem.MolFromSmiles('c1c1') ...: m1 = Chem.AddHs(m1, onlyOnAtoms=(2,3,4)) ...: Chem.MolToSmiles(m1) ...: ...: Out[5]: '[H]c1([H])c1[H]' Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl> 2017-01-21 15:54 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>>: Czesc Maciek, Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is exactly what I would need. If it is not too big of a problem would it be possible for you to give me a simple example how to toggle that option on? I am sorry if this question seems obvious but I am not a programmer and my python skills are not yet advanced. Best regards, Janusz Petkowski From: Maciek Wójcikowski [mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl>] Sent: Saturday, January 21, 2017 5:35 AM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms Hi Janusz, AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms to include. [http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AddHs] Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl> 2017-01-20 23:21 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>>: Dear RDKit Community, By default H atoms are not explicit in the molecular graph and because of that the substructure matching is ignoring them when searching for substructures. It is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in the molecule and then perform substructure matching but is it possible, in RDkit, to add explicit hydrogens specifically to atoms of choice instead to all of them? So let's say if I do: m1 = Chem.MolFromSmiles('C=C') m1_H = Chem.AddHs(m1) print m1_H.GetNumAtoms() print Chem.MolToSmiles(m1_H) The result is: >>> 6 >>> [H]C([H])=C([H])[H] What if I would like to add only one (1) explicit hydrogen atom to a specific non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I would want to have: print m1_H.GetNumAtoms() print Chem.MolToSmiles(m1_H) >>> 3 >>> [H]C=C I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1) which correctly adds an explicit H to C=C molecule but somehow I cannot convert it to smiles with this one additional explicit H added or subsequently use for substructure matching. At the end I would like to do a substructure matching where the following query structures: [H]C=C or [H]C=CC match the following molecule: [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] but at the same time those query structures: [H]C=C([H])[H] or [H]C([H])=CC do not match [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] PS. Of course, the structure [H]C([H])=C([H])[H] converted from C=C using Chem.AddHs(mol) will not be matched onto [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] which is correct. Thank you very much for your help, Best regards, Janusz Petkowski -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world
Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms
Czesc Maciek, Thanks a lot for suggesting "onlyOnAtoms" option out. It looks like it is exactly what I would need. If it is not too big of a problem would it be possible for you to give me a simple example how to toggle that option on? I am sorry if this question seems obvious but I am not a programmer and my python skills are not yet advanced. Best regards, Janusz Petkowski From: Maciek Wójcikowski [mac...@wojcikowski.pl] Sent: Saturday, January 21, 2017 5:35 AM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms Hi Janusz, AddHs has a parameter "onlyOnAtoms" which takes a list of indices of atoms to include. [http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#AddHs] Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl<mailto:mac...@wojcikowski.pl> 2017-01-20 23:21 GMT+01:00 Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>>: Dear RDKit Community, By default H atoms are not explicit in the molecular graph and because of that the substructure matching is ignoring them when searching for substructures. It is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in the molecule and then perform substructure matching but is it possible, in RDkit, to add explicit hydrogens specifically to atoms of choice instead to all of them? So let's say if I do: m1 = Chem.MolFromSmiles('C=C') m1_H = Chem.AddHs(m1) print m1_H.GetNumAtoms() print Chem.MolToSmiles(m1_H) The result is: >>> 6 >>> [H]C([H])=C([H])[H] What if I would like to add only one (1) explicit hydrogen atom to a specific non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I would want to have: print m1_H.GetNumAtoms() print Chem.MolToSmiles(m1_H) >>> 3 >>> [H]C=C I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1) which correctly adds an explicit H to C=C molecule but somehow I cannot convert it to smiles with this one additional explicit H added or subsequently use for substructure matching. At the end I would like to do a substructure matching where the following query structures: [H]C=C or [H]C=CC match the following molecule: [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] but at the same time those query structures: [H]C=C([H])[H] or [H]C([H])=CC do not match [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] PS. Of course, the structure [H]C([H])=C([H])[H] converted from C=C using Chem.AddHs(mol) will not be matched onto [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] which is correct. Thank you very much for your help, Best regards, Janusz Petkowski -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] adding custom number of explicit H to specified non-hydrogen atoms
Dear RDKit Community, By default H atoms are not explicit in the molecular graph and because of that the substructure matching is ignoring them when searching for substructures. It is possible to use Chem.AddHs(mol) to add explicit hydrogens to all atoms in the molecule and then perform substructure matching but is it possible, in RDkit, to add explicit hydrogens specifically to atoms of choice instead to all of them? So let's say if I do: m1 = Chem.MolFromSmiles('C=C') m1_H = Chem.AddHs(m1) print m1_H.GetNumAtoms() print Chem.MolToSmiles(m1_H) The result is: >>> 6 >>> [H]C([H])=C([H])[H] What if I would like to add only one (1) explicit hydrogen atom to a specific non-hydrogen atom (let's say m1.GetAtomWithIdx(0). In that case I would want to have: print m1_H.GetNumAtoms() print Chem.MolToSmiles(m1_H) >>> 3 >>> [H]C=C I tried to use the following method: m1.GetAtomWithIdx(0).SetNumExplicitHs(1) which correctly adds an explicit H to C=C molecule but somehow I cannot convert it to smiles with this one additional explicit H added or subsequently use for substructure matching. At the end I would like to do a substructure matching where the following query structures: [H]C=C or [H]C=CC match the following molecule: [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] but at the same time those query structures: [H]C=C([H])[H] or [H]C([H])=CC do not match [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] PS. Of course, the structure [H]C([H])=C([H])[H] converted from C=C using Chem.AddHs(mol) will not be matched onto [H]C(=C([H])C([H])([H])[H])C([H])([H])[H] which is correct. Thank you very much for your help, Best regards, Janusz Petkowski -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MCS module - bonding and hybridization in substructure search
Dear Greg and Peter, Thank you very much for your feedback and I am very sorry if my examples were not clear enough. Please look at those below, provided in a format Greg requested. I hope it helps in explaining what I mean. Thanks a lot! Best regards, Janusz Petkowski As an additional requirement for the results the (ringMatchesRingOnly and completeRingsOnly methods are always applied in each case) Example 1: ["CC=CNC", "C=CNC=CC"] ==> CC=CN Example 2: ["CC(N)C(N)=O", "CC(N)C(=O)NC(C)C(=O)O"] ==> CC(N)C(N)=O ["CC(N)C(=O)O", "CC(N)C(=O)NC(C)C(=O)O"] ==> CC(N)C(=O)O Example 3: ["C\C=C\N", "C\C=C\NC1CCC1"] ==> C/C=C/N ["CCCN", "CCCNC1CCC1"] ==> CCCN ["CCCN" ,"CCCNC1=CCC1"] ==> CCCN Example 4: ["NC1CCC1", "C\C=C\NC1CCC1"] ==> NC1CCC1 Example 5: ["NC1=CCC1", "CCN=NC1=CCC1"] ==> C1CC=C1 Example 6: ["NC1=CCC1", "CC\C=N/C1=CCC1"] ==> C1CC=C1 ["NC1=CCC1", "CC\C=N/C1CCC1"] ==> None Example 7: ["CCC", "CC(C)=O"] ==> None ["CCC", "CC(C)O"] ==> CCC ["CCC", "CC(C)=N"] ==> None ["CCC", "CC(C)N"] ==> CCC ["CCC", "CCC=C=C"] ==> None ["C=C=C ", "CCC=C=C"] ==> C=C=C Example 8: ["NC1CCC1" ," CN=C1CCC1"] ==> CCC (but if ringMatchesRingOnly and completeRingsOnly methods are on at the same time ==> None) From: Peter Shenkin [shen...@gmail.com] Sent: Sunday, November 15, 2015 2:44 PM To: Janusz Petkowski Cc: Greg Landrum; rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] MCS module - bonding and hybridization in substructure search Say, Greg, If you understand Janusz's request, could you perhaps explain it in other words? I don't quite follow it, despite having read the two emails. I'm getting the sense that he wants to make sure that SP2 nitrogens match only SP2 nitrogens (for example). Is this right? I know OpenEye has an extension to specify hybridization, but don't know whether RDKit has implemented something like that; if not, a recursive SMARTS ought to be able to do it. On Sun, Nov 15, 2015 at 10:55 AM, Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote: Dear Greg, Thank you very much for your reply. I will try to explain more what I would like to achieve, I hope that it will clarify things a little. Let's look at your example firs and let's treat the first molecule (CC=CNC) in ["CC=CNC", "C=CNC=CC"] as a "query", we would like to check if it is an EXACT match to the second molecule ("C=CNC=CC"). Your example is a case of the "solution to the Liz Wylie problem" at its best. ["CC=CNC", "C=CNC=CC"] ==> CC=CN - so 'no' - no exact match! And it is what we would expect upon the implementation of the current "solution to the Liz Wylie problem" and this is what I would consider "CORRECT" for my purposes. Tables below are as follows: >>> bond_type, bond_start_atom, bond_start_atom_symbol, bond_start_atom_hyb, >>> bond_end_atom, bond_end_atom_symbol, bond_end_atom_hyb CC=CNC SINGLE 0 C SP3 1 C SP2 DOUBLE 1 C SP2 2 C SP2 SINGLE 2 C SP2 3 N SP2 SINGLE 3 N SP2 4 C SP3 C=CNC=CC DOUBLE 0 C SP2 1 C SP2 SINGLE 1 C SP2 2 N SP2 SINGLE 2 N SP2 3 C SP2 DOUBLE 3 C SP2 4 C SP2 SINGLE 4 C SP2 5 C SP3 In your example the hybridizations of C atoms in the CNC fragment of both molecules do not match and the overall result is ok. In the first "query" molecule the hybridization of the first C in the CNC fragment is sp2 (and it is connected to the first C in the "query" molecule via the DOUBLE bond), then the N is sp2, but the last C is sp3 and is bonded only via SINGLE bonds. In the second molecule (C=CNC=CC) both carbons in CNC fragment are sp2 AND both carbons are bonded via DOUBLE bonds, not like in the "query" molecule DOUBLE and SINGLE. What I would like to do is to check if one structure is an exact match within the other, so the atoms must match, the bonds must match and the hybridization of an atom must match, but the bonding is the most important thing and that is where the exceptions show, because you can have an sp2 atom that is bonded via a SINGLE bond. Let me illustrate on couple of examples what I mean. Examples to illustrate it: Example 1, Ala-Ala dipeptide case: CC(N)C(=O)NC(C)C(=O)O SINGLE 0 C SP3 1 C SP3 SINGLE 1 C SP3 2 N SP3 SINGLE 1 C SP3 3 C SP2 DOUBLE 3 C SP2 4 O SP2 SINGLE 3 C SP2 5 N SP2 SINGLE 5 N SP2 6 C SP3 SINGLE 6 C SP3 7 C SP3 SINGLE 6 C SP3 8 C SP2 SINGLE 8 C SP2 9 O SP2 DOUBLE 8 C SP2 10 O SP2 if I have two "query" molecules: 1) CC(N
Re: [Rdkit-discuss] MCS module - bonding and hybridization in substructure search
mportance then the hybridization match. Example 3: The last example is an illustration of a hierarchical importance of matching I need. It is an example when everything matches but the result is "INCORRECT". CC\N=N\C1=CCC1 CCN=NC1=CCC1 SINGLE 0 C SP3 1 C SP3 SINGLE 1 C SP3 2 N SP2 DOUBLE 2 N SP2 3 N SP2 SINGLE 3 N SP2 4 C SP2 DOUBLE 4 C SP2 5 C SP2 SINGLE 5 C SP2 6 C SP3 SINGLE 6 C SP3 7 C SP3 SINGLE 7 C SP3 4 C SP2 One "query" molecule: 1) NC1=CCC1 NC1=CCC1 SINGLE 0 N SP2 1 C SP2 DOUBLE 1 C SP2 2 C SP2 SINGLE 2 C SP2 3 C SP3 SINGLE 3 C SP3 4 C SP3 SINGLE 4 C SP3 1 C SP2 ["NC1=CCC1", "CCN=NC1=CCC1"] ==> NC1=CCC1 - so 'yes' - exact match! But it is "INCORRECT". Why? Even if the hybridizations of N atoms in the "query" and in the CCN=NC1=CCC1 is sp2, both N atoms in the CCN=NC1=CCC1 molecule are DOUBLE bonded and the N atom in the "query" molecule is SINGLE bonded, so the bonding does not match and as I mentioned earlier the bonding has higher order of importance than the hybridization. I hope that that this clarifies what I would like to achieve, I know that it is probably highly non-standard problem and an unique one, but I would really appreciate your help with that matter! Of course the examples I gave are purely for computational purposes and they do not reflect the chemical stability of those molecules. Thanks a lot once again! Have a great Sunday! Janusz Petkowski From: Greg Landrum [greg.land...@gmail.com] Sent: Saturday, November 14, 2015 11:26 PM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] MCS module - bonding and hybridization in substructure search Hi Janusz, I'm not 100% sure what you're looking for, but I think it has something to do with including information about bond conjugation in the MCS procedure. To confirm, can you please give a couple of examples of what you would like to have as output from the algorithm? Something like this with the input molecules on the left and the desired result on the right would help : ['CNC=CC', 'C=CNC=CC'] -> 'CNC=CC' (I realize that specific example is not what you're looking for, it's just intended to be an example) Once I've seen that I can try to figure out if it is currently doable and, if not, if it's possible to modify the code to support it. Best, -greg On Fri, Nov 13, 2015 at 9:17 PM, Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote: Dear RDKit Community, I am looking for a way to use MCS module in RDKit to compare atoms and bonding of two molecules which will also take under consideration the hybridization of an atom. The solution to similar problem was suggested before, (Inspired by this RDKit-discuss thread started by Liz Wylie: http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03676.html and see here http://sourceforge.net/p/rdkit/mailman/message/31830412/ ) but even if it is computationally correct it does not necessarily mirror some nuances of chemistry and one may want to modify it in certain specific cases. While it works most of the time for cases like those proposed in the solution of Liz Wylie case: smis = ['CC(C)=C','CC(C)C'] or smis2 = ['CC(C)=C','CC(C)=N'] If we check if 'CCC' substructure is present in molecules from those two data sets upon implementation of Greg Landrum solution to CCC will be found only in 'CC(C)C', taking in to the account the atoms, the bonding and the hybridization of the atoms. It is all correct and cool! But let's look at the other example: Let's look for the N\CC\N substructure in 'C\C=C\NCCN\C=C\C' or the 'NCN' substructure in NCN-C=C or ' C=CNCNC=C'. It will not be found there even if "structurally speaking" it is there. The problem is as follows: an electronegative atom next to a C=C bond will pull electron density from that bond and so the N-C bond in NCN-C=C will have a ‘bit of’ double bond character, even if technically it is a single bond. The current solution to the Liz Wylie problem does not ignore that and distinguishes between regular N-C bond and an N-C bond next to C=C bond (like in NCN-C=C, because of that it will not find NCN in this structure). NCS in NCSC=C is matched because the S bond is more electropositive than N or O and so does not have that double-bond character. My question to the RDKit community is: How to modify Greg Landrum solution to Liz Wylie case to successfully match such cases I mentioned above, while still retaining the hybridization check (we do want to have hybridization match, we just want the bonding to be more important). The problem is that the atoms that are not matched like the N atoms above have sp2 hybridization but technically are bonded by single bonds from all sides. Thanks a lot for your help, time and consideration. This is my first post on RDKit forum, I am new
Re: [Rdkit-discuss] defining the size of the ring in the ringMatchesRingOnly and completeRingsOnly methods - the macrocycles case
Dear Greg, Thank you very much for addressing my macrocycles question. If this is not to much trouble for you could you give me a short guide how should I proceed with editing RingInfo data structure so it "forgets" that rings above a certain size exist? I am sorry to burden you with this but I only started learning programming around two months ago and my python programming skills are still quite limited. Thanks a lot for all your help! Janusz Petkowski From: Greg Landrum [greg.land...@gmail.com] Sent: Saturday, November 14, 2015 11:37 PM To: Janusz Petkowski Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] defining the size of the ring in the ringMatchesRingOnly and completeRingsOnly methods - the macrocycles case Dear Janusz, This isn't currently possible. The most straightforward way I could think to implement it (maybe someone else has a better idea?) would be to allow the molecule's RingInfo data structure to be edited so that you could, for example, tell it to "forget" that rings above a certain size exist. This would be relatively straightforward to do and I could imagine that functionality being useful in other places as well. -greg On Sun, Nov 15, 2015 at 12:08 AM, Janusz Petkowski <jjpet...@mit.edu<mailto:jjpet...@mit.edu>> wrote: One other question about MCS, in addition to my previous one on hybridization: In the RDKit documentation in the Maximum Common Substructure (MCS) section it is mentioned that one can restrict mapping linear fragments on to rings using two methods: ringMatchesRingOnly and completeRingsOnly. It is an extremely useful method but is there a possibility to restrict execution of this method by defining the size of the rings for which ring bonds will match only ring bonds in a given molecule? But at the same time, for rings of a certain size (let's say for rings that have below 8 atoms) the function is still executed? I am trying to avoid the problem of not finding linear fragments in the macrocycles structures. I want linear fragments to be matched in macrocycles but not in rings of smaller ("regular") size, all within the same molecule of course. Is there a way to do that? Just as an illustration of the problem: Is it possible to find ClCC=C fragment in CC(F)C1CC\C=C(C)\C(Cl)C2C=CC( Br)CC2[C@H](C)[C@]2(O)OC3=C(C)C(=O)C(O)=C([C@@H]4O[C@H]1[C@H](C)[C@H](O)[C@H]4C)C3=C2 but at the same time avoid finding CC(Br)C=C in it, by using the ringMatchesRingOnly and completeRingsOnly methods? With current implementation of ringMatchesRingOnly and completeRingsOnly methods they treat all the rings the same way, no matter the size. If not, how would one do that? Thanks a lot for your help! Have a great weekend! Best regards, Janusz Petkowski -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] defining the size of the ring in the ringMatchesRingOnly and completeRingsOnly methods - the macrocycles case
One other question about MCS, in addition to my previous one on hybridization: In the RDKit documentation in the Maximum Common Substructure (MCS) section it is mentioned that one can restrict mapping linear fragments on to rings using two methods: ringMatchesRingOnly and completeRingsOnly. It is an extremely useful method but is there a possibility to restrict execution of this method by defining the size of the rings for which ring bonds will match only ring bonds in a given molecule? But at the same time, for rings of a certain size (let's say for rings that have below 8 atoms) the function is still executed? I am trying to avoid the problem of not finding linear fragments in the macrocycles structures. I want linear fragments to be matched in macrocycles but not in rings of smaller ("regular") size, all within the same molecule of course. Is there a way to do that? Just as an illustration of the problem: Is it possible to find ClCC=C fragment in CC(F)C1CC\C=C(C)\C(Cl)C2C=CC( Br)CC2[C@H](C)[C@]2(O)OC3=C(C)C(=O)C(O)=C([C@@H]4O[C@H]1[C@H](C)[C@H](O)[C@H]4C)C3=C2 but at the same time avoid finding CC(Br)C=C in it, by using the ringMatchesRingOnly and completeRingsOnly methods? With current implementation of ringMatchesRingOnly and completeRingsOnly methods they treat all the rings the same way, no matter the size. If not, how would one do that? Thanks a lot for your help! Have a great weekend! Best regards, Janusz Petkowski -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MCS module - bonding and hybridization in substructure search
Dear RDKit Community, I am looking for a way to use MCS module in RDKit to compare atoms and bonding of two molecules which will also take under consideration the hybridization of an atom. The solution to similar problem was suggested before, (Inspired by this RDKit-discuss thread started by Liz Wylie: http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03676.html and see here http://sourceforge.net/p/rdkit/mailman/message/31830412/ ) but even if it is computationally correct it does not necessarily mirror some nuances of chemistry and one may want to modify it in certain specific cases. While it works most of the time for cases like those proposed in the solution of Liz Wylie case: smis = ['CC(C)=C','CC(C)C'] or smis2 = ['CC(C)=C','CC(C)=N'] If we check if 'CCC' substructure is present in molecules from those two data sets upon implementation of Greg Landrum solution to CCC will be found only in 'CC(C)C', taking in to the account the atoms, the bonding and the hybridization of the atoms. It is all correct and cool! But let's look at the other example: Let's look for the N\CC\N substructure in 'C\C=C\NCCN\C=C\C' or the 'NCN' substructure in NCN-C=C or ' C=CNCNC=C'. It will not be found there even if "structurally speaking" it is there. The problem is as follows: an electronegative atom next to a C=C bond will pull electron density from that bond and so the N-C bond in NCN-C=C will have a ‘bit of’ double bond character, even if technically it is a single bond. The current solution to the Liz Wylie problem does not ignore that and distinguishes between regular N-C bond and an N-C bond next to C=C bond (like in NCN-C=C, because of that it will not find NCN in this structure). NCS in NCSC=C is matched because the S bond is more electropositive than N or O and so does not have that double-bond character. My question to the RDKit community is: How to modify Greg Landrum solution to Liz Wylie case to successfully match such cases I mentioned above, while still retaining the hybridization check (we do want to have hybridization match, we just want the bonding to be more important). The problem is that the atoms that are not matched like the N atoms above have sp2 hybridization but technically are bonded by single bonds from all sides. Thanks a lot for your help, time and consideration. This is my first post on RDKit forum, I am new to RDKit and python in general, so I apologize if I anything is not clear. I would really appreciate your help! Best regards, Janusz Petkowski -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss