Re: [Rdkit-discuss] Explicit H in substructure searches

Greg Landrum Tue, 05 Nov 2019 23:28:38 -0800

Paolo's answer was completely correct, but there's an additional point
that's worth mentioning here.
Hs are often included in query molecules with the intent of restricting
possible valence states of atoms, not because the user is actually
interested in matching Hs. In this case you can use the function
Chem.MergeQueryHs() to remove the H atoms in your query molecule and
add/adjust H count queries on the heavy atoms they are connected to.


Here's how that works in your example:

In [6]: params = Chem.SmilesParserParams()
   ...: params.removeHs=False
   ...: query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1', params)


In [7]: m1 = Chem.MolFromSmiles('c1cn[nH]c1N')
   ...: m2 = Chem.MolFromSmiles('CNc1ccn[nH]1')
   ...: m3 = Chem.MolFromSmiles('Nc1ccnn(C)1')

In [8]: m1.HasSubstructMatch(query)

Out[8]: False

In [15]: q2 = Chem.MergeQueryHs(query)


In [16]: m1.HasSubstructMatch(q2)

Out[16]: True

In [17]: m2.HasSubstructMatch(q2)

Out[17]: False

In [18]: m3.HasSubstructMatch(q2)

Out[18]: True


You can see what has happened by calling MolToSmarts:

In [19]: Chem.MolToSmarts(q2)

Out[19]: '[#6]1:[#6]:[#7]:[#7H]:[#6]:1-[#7&!H0&!H1]'


Notice that the N atom now has query features attached to it.

I hope this helps,
-greg


On Tue, Nov 5, 2019 at 7:53 PM Markus Heller <mhel...@admarebio.com> wrote:

> Hi,
>
>
>
> I’m trying to understand how to properly use explicit hydrogens in
> substructure searches.  Below is an example.  I would like to find all
> molecules that contain my query with hydrogens at the nitrogens, and I
> thought I was on the right track …  Why does the first query with the
> explicit H not match m1?
>
>
>
> Thanks
>
> Markus
>
>
>
> <code>
>
> from rdkit import Chem
>
> from rdkit.Chem.Draw import IPythonConsole
>
> from rdkit.Chem import rdDepictor
>
>
>
> rdDepictor.SetPreferCoordGen(True)
>
> IPythonConsole.ipython_useSVG = True
>
>
>
> m1 = Chem.MolFromSmiles('c1cn[nH]c1N')
>
> m2 = Chem.MolFromSmiles('CNc1ccn[nH]1')
>
> m3 = Chem.MolFromSmiles('Nc1ccnn(C)1')
>
>
>
> # do not remove explicit H
>
> params = Chem.SmilesParserParams()
>
> params.removeHs=False
>
>
>
> query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1', params)
>
>
>
> # first should be True, but all are False
>
> m1.HasSubstructMatch(query)
>
> m2.HasSubstructMatch(query)
>
> m3.HasSubstructMatch(query)
>
>
>
> # rebuild query with explicit H removed, not what I want
>
> query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1')
>
>
>
> m1.HasSubstructMatch(query)
>
> m2.HasSubstructMatch(query)
>
> m3.HasSubstructMatch(query)
>
>
>
> </code>
>
> --
>
> *Markus Heller, PhD*
>
> Senior Scientist
>
> Direct: 604.827.1122   Main: 604.827.1147
>
>
>
>  [image: A027228F]
>
> 2405 Wesbrook Mall, 4th Floor, Vancouver, BC V6T 1Z3
>
>
>
> This email and any attachments thereto may contain confidential material
> for the sole use of the intended recipient. Any review, copying, or
> distribution of this email (or any attachments thereto) by others is strictly
> prohibited. If you are not the intended recipient, please contact the
> sender immediately and permanently delete the original and any copies of
> this email and any attachments thereto.
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Explicit H in substructure searches

Reply via email to