Andrew,

Thank you for the input.  Actually, upon further inspection after you asked for 
a full example, I was looking for a hit compound that was not flagged as a 
PAINS compound because of incorrect interpretation of !#n, and I couldn't find 
any.  In fact when I looked closer at my sanitized PAINS flags, I found that 
the new sanitized filter queries were in fact incorrectly flagging molecules.  
For example flagging a dimethoxybenzene moiety as a catechol.

Thank you for your help in this, and I will keep in mind in the future that it 
is inappropriate to try and sanitize SMARTS queries.

Thanks again


Christopher R. Bodle
PhD Candidate, University of Iowa
College of Pharmacy
Division of Medicinal and Natural Products Chemistry
115 S. Grand Avenue-Rm. S338
Iowa City, Iowa 52242
(319) 335-7845



________________________________________
From: Andrew Dalke [da...@dalkescientific.com]
Sent: Wednesday, September 16, 2015 5:23 PM
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] trouble with SMARTs interpretation of 'not 
hydrogen'

On Sep 16, 2015, at 9:57 PM, Bodle, Christopher R wrote:
> I am having trouble with RDKit correctly interpreting the SMARTS character 
> [!#1], which should be interpreted as "any atom not hydrogen.

I've been looking at your emails but it's difficult for me to figure out what 
you are doing. Can you generate a smaller reproducible?

My guess is that you are looking at the RDKit depiction of a molecule generated 
from a SMARTS string.    This is a query molecule. As I recall, this is 
incomplete, and there is an open call out for someone interested in generating 
a better query depiction. If that's the case, then what you see is inability of 
the renderer to display a "not". This shouldn't affect the ability to match a 
molecule.

I also don't understand this:

> My SMARTS input:
> [#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7]
>
> Now when I do Chem.MolFromSmarts, my mol representation has hydrogens at 
> those three positions, and as such I can't do sanitization of the molecule 
> because since it has hydrogens in the !#1 positions, there is a valency 
> conflict.

It doesn't make sense to me to do sanitization of molecule that came from a 
SMARTS query.

It looks like you have tried to convert a query-based molecule into a more 
chemical molecule. That is, I can reproduce some of what you report by using:

  >>> from rdkit import Chem
  >>> mol = 
Chem.MolFromSmarts("[#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7]")
  >>> Chem.MolToSmiles(mol)
  '[H]N1[H]=[H][H]=C(C#N)C1=S'

This produces a nearly meaningless conversion. For example, consider:

  >>> mol = Chem.MolFromSmarts("[#92,#93][$(N=N)]")
  >>> Chem.MolToSmiles(mol)
  '[*][U]'
  >>> mol = Chem.MolFromSmarts("[#93,#92][$(N=N)]")
  >>> Chem.MolToSmiles(mol)
  '[*][Np]'

When there is a choice of atoms, it picks the first, given 'U' and 'Np' when I 
swap the two element numbers. And it shows a recursive SMARTS as a '*'.

As far as I can tell, the "[!#1]" works correctly. Here's a case where it 
matches an 'N':

  >>> pat = Chem.MolFromSmarts("C-[!#1]-C")

  >>> mol = Chem.MolFromSmiles("CNC")
  >>> mol.HasSubstructMatch(pat)
  True

RDKit won't parse a 2-valent hydrogen by default:

  >>> mol = Chem.MolFromSmiles("C[H]C")
  [00:15:07] Explicit valence for atom # 1 H, 2, is greater than permitted

but if I disable sanitization, I can show that the pattern doesn't match this 
molecule:

  >>> mol = Chem.MolFromSmiles("C[H]C", sanitize=False)
  >>> mol.HasSubstructMatch(pat)
  False

And to double-check that the sanitize flag isn't doing something odd:

  >>> mol = Chem.MolFromSmiles("C[N]C", sanitize=False)
  >>> mol.HasSubstructMatch(pat)
  True

Since the SMARTS pattern doesn't work for you, but does seem to work for me, 
could you give a test case which is just the SMILES/SMARTS or molfile/SMARTS 
combination which gives the failure? That is, without the incomplete 
scaffolding that you showed.


Cheers,

                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to