Andrew, Thank you for the input. Actually, upon further inspection after you asked for a full example, I was looking for a hit compound that was not flagged as a PAINS compound because of incorrect interpretation of !#n, and I couldn't find any. In fact when I looked closer at my sanitized PAINS flags, I found that the new sanitized filter queries were in fact incorrectly flagging molecules. For example flagging a dimethoxybenzene moiety as a catechol.
Thank you for your help in this, and I will keep in mind in the future that it is inappropriate to try and sanitize SMARTS queries. Thanks again Christopher R. Bodle PhD Candidate, University of Iowa College of Pharmacy Division of Medicinal and Natural Products Chemistry 115 S. Grand Avenue-Rm. S338 Iowa City, Iowa 52242 (319) 335-7845 ________________________________________ From: Andrew Dalke [da...@dalkescientific.com] Sent: Wednesday, September 16, 2015 5:23 PM Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] trouble with SMARTs interpretation of 'not hydrogen' On Sep 16, 2015, at 9:57 PM, Bodle, Christopher R wrote: > I am having trouble with RDKit correctly interpreting the SMARTS character > [!#1], which should be interpreted as "any atom not hydrogen. I've been looking at your emails but it's difficult for me to figure out what you are doing. Can you generate a smaller reproducible? My guess is that you are looking at the RDKit depiction of a molecule generated from a SMARTS string. This is a query molecule. As I recall, this is incomplete, and there is an open call out for someone interested in generating a better query depiction. If that's the case, then what you see is inability of the renderer to display a "not". This shouldn't affect the ability to match a molecule. I also don't understand this: > My SMARTS input: > [#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7] > > Now when I do Chem.MolFromSmarts, my mol representation has hydrogens at > those three positions, and as such I can't do sanitization of the molecule > because since it has hydrogens in the !#1 positions, there is a valency > conflict. It doesn't make sense to me to do sanitization of molecule that came from a SMARTS query. It looks like you have tried to convert a query-based molecule into a more chemical molecule. That is, I can reproduce some of what you report by using: >>> from rdkit import Chem >>> mol = Chem.MolFromSmarts("[#6]-1(=[!#1]-[!#1]=[!#1]-[#7](-[#6]-1=[#16])-[#1])-[#6]#[#7]") >>> Chem.MolToSmiles(mol) '[H]N1[H]=[H][H]=C(C#N)C1=S' This produces a nearly meaningless conversion. For example, consider: >>> mol = Chem.MolFromSmarts("[#92,#93][$(N=N)]") >>> Chem.MolToSmiles(mol) '[*][U]' >>> mol = Chem.MolFromSmarts("[#93,#92][$(N=N)]") >>> Chem.MolToSmiles(mol) '[*][Np]' When there is a choice of atoms, it picks the first, given 'U' and 'Np' when I swap the two element numbers. And it shows a recursive SMARTS as a '*'. As far as I can tell, the "[!#1]" works correctly. Here's a case where it matches an 'N': >>> pat = Chem.MolFromSmarts("C-[!#1]-C") >>> mol = Chem.MolFromSmiles("CNC") >>> mol.HasSubstructMatch(pat) True RDKit won't parse a 2-valent hydrogen by default: >>> mol = Chem.MolFromSmiles("C[H]C") [00:15:07] Explicit valence for atom # 1 H, 2, is greater than permitted but if I disable sanitization, I can show that the pattern doesn't match this molecule: >>> mol = Chem.MolFromSmiles("C[H]C", sanitize=False) >>> mol.HasSubstructMatch(pat) False And to double-check that the sanitize flag isn't doing something odd: >>> mol = Chem.MolFromSmiles("C[N]C", sanitize=False) >>> mol.HasSubstructMatch(pat) True Since the SMARTS pattern doesn't work for you, but does seem to work for me, could you give a test case which is just the SMILES/SMARTS or molfile/SMARTS combination which gives the failure? That is, without the incomplete scaffolding that you showed. Cheers, Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140 _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ------------------------------------------------------------------------------ Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140 _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss