On Sep 16, 2015, at 9:57 PM, Bodle, Christopher R wrote:
> I am having trouble with RDKit correctly interpreting the SMARTS character
> [!#1], which should be interpreted as "any atom not hydrogen.
I've been looking at your emails but it's difficult for me to figure out what
you are doing. Can you generate a smaller reproducible?
My guess is that you are looking at the RDKit depiction of a molecule generated
from a SMARTS string. This is a query molecule. As I recall, this is
incomplete, and there is an open call out for someone interested in generating
a better query depiction. If that's the case, then what you see is inability of
the renderer to display a "not". This shouldn't affect the ability to match a
molecule.
I also don't understand this:
> My SMARTS input:
> [#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7]
>
> Now when I do Chem.MolFromSmarts, my mol representation has hydrogens at
> those three positions, and as such I can't do sanitization of the molecule
> because since it has hydrogens in the !#1 positions, there is a valency
> conflict.
It doesn't make sense to me to do sanitization of molecule that came from a
SMARTS query.
It looks like you have tried to convert a query-based molecule into a more
chemical molecule. That is, I can reproduce some of what you report by using:
>>> from rdkit import Chem
>>> mol =
Chem.MolFromSmarts("[#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7]")
>>> Chem.MolToSmiles(mol)
'[H]N1[H]=[H][H]=C(C#N)C1=S'
This produces a nearly meaningless conversion. For example, consider:
>>> mol = Chem.MolFromSmarts("[#92,#93][$(N=N)]")
>>> Chem.MolToSmiles(mol)
'[*][U]'
>>> mol = Chem.MolFromSmarts("[#93,#92][$(N=N)]")
>>> Chem.MolToSmiles(mol)
'[*][Np]'
When there is a choice of atoms, it picks the first, given 'U' and 'Np' when I
swap the two element numbers. And it shows a recursive SMARTS as a '*'.
As far as I can tell, the "[!#1]" works correctly. Here's a case where it
matches an 'N':
>>> pat = Chem.MolFromSmarts("C-[!#1]-C")
>>> mol = Chem.MolFromSmiles("CNC")
>>> mol.HasSubstructMatch(pat)
True
RDKit won't parse a 2-valent hydrogen by default:
>>> mol = Chem.MolFromSmiles("C[H]C")
[00:15:07] Explicit valence for atom # 1 H, 2, is greater than permitted
but if I disable sanitization, I can show that the pattern doesn't match this
molecule:
>>> mol = Chem.MolFromSmiles("C[H]C", sanitize=False)
>>> mol.HasSubstructMatch(pat)
False
And to double-check that the sanitize flag isn't doing something odd:
>>> mol = Chem.MolFromSmiles("C[N]C", sanitize=False)
>>> mol.HasSubstructMatch(pat)
True
Since the SMARTS pattern doesn't work for you, but does seem to work for me,
could you give a test case which is just the SMILES/SMARTS or molfile/SMARTS
combination which gives the failure? That is, without the incomplete
scaffolding that you showed.
Cheers,
Andrew
[email protected]
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss