Re: [Rdkit-discuss] trouble with SMARTs interpretation of 'not hydrogen'

Andrew Dalke Wed, 16 Sep 2015 15:25:32 -0700

On Sep 16, 2015, at 9:57 PM, Bodle, Christopher R wrote:
> I am having trouble with RDKit correctly interpreting the SMARTS character 
> [!#1], which should be interpreted as "any atom not hydrogen.


I've been looking at your emails but it's difficult for me to figure out what 
you are doing. Can you generate a smaller reproducible?

My guess is that you are looking at the RDKit depiction of a molecule generated 
from a SMARTS string.    This is a query molecule. As I recall, this is 
incomplete, and there is an open call out for someone interested in generating 
a better query depiction. If that's the case, then what you see is inability of 
the renderer to display a "not". This shouldn't affect the ability to match a 
molecule.

I also don't understand this:

> My SMARTS input:
> [#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7]
> 
> Now when I do Chem.MolFromSmarts, my mol representation has hydrogens at 
> those three positions, and as such I can't do sanitization of the molecule 
> because since it has hydrogens in the !#1 positions, there is a valency 
> conflict.

It doesn't make sense to me to do sanitization of molecule that came from a 
SMARTS query. 

It looks like you have tried to convert a query-based molecule into a more 
chemical molecule. That is, I can reproduce some of what you report by using:

  >>> from rdkit import Chem
  >>> mol = 
Chem.MolFromSmarts("[#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7]")
  >>> Chem.MolToSmiles(mol)
  '[H]N1[H]=[H][H]=C(C#N)C1=S'

This produces a nearly meaningless conversion. For example, consider:

  >>> mol = Chem.MolFromSmarts("[#92,#93][$(N=N)]")
  >>> Chem.MolToSmiles(mol)
  '[*][U]'
  >>> mol = Chem.MolFromSmarts("[#93,#92][$(N=N)]")
  >>> Chem.MolToSmiles(mol)
  '[*][Np]'

When there is a choice of atoms, it picks the first, given 'U' and 'Np' when I 
swap the two element numbers. And it shows a recursive SMARTS as a '*'.

As far as I can tell, the "[!#1]" works correctly. Here's a case where it 
matches an 'N':

  >>> pat = Chem.MolFromSmarts("C-[!#1]-C")

  >>> mol = Chem.MolFromSmiles("CNC")
  >>> mol.HasSubstructMatch(pat)
  True

RDKit won't parse a 2-valent hydrogen by default:

  >>> mol = Chem.MolFromSmiles("C[H]C")
  [00:15:07] Explicit valence for atom # 1 H, 2, is greater than permitted

but if I disable sanitization, I can show that the pattern doesn't match this 
molecule:

  >>> mol = Chem.MolFromSmiles("C[H]C", sanitize=False)
  >>> mol.HasSubstructMatch(pat)
  False

And to double-check that the sanitize flag isn't doing something odd:

  >>> mol = Chem.MolFromSmiles("C[N]C", sanitize=False)
  >>> mol.HasSubstructMatch(pat)
  True

Since the SMARTS pattern doesn't work for you, but does seem to work for me, 
could you give a test case which is just the SMILES/SMARTS or molfile/SMARTS 
combination which gives the failure? That is, without the incomplete 
scaffolding that you showed.


Cheers,

                                Andrew
                                [email protected]



------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] trouble with SMARTs interpretation of 'not hydrogen'

Reply via email to