Hi John! thanks for the research on this. This would have taken a lot of time for me to find this out...
so [C]1[C][C][C][C]1 is perceived as aromatic... this is in accordance with the different behavior I see when I run the same code with six-rings instead of five-rings. For 6-rings, there's no problem, presumably because it's not perceived as aromatic. So what I do is first clone the original atomcontainer (to prevent it from updating the implicit H-count), and then run the atom typing and adding hydrogens on each of the IRings. IRingSet ringSet = new SSSRFinder(m.clone()).findSSSR();//find SSSR rings for(IAtomContainer ring : ringSet.atomContainers()){ AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(ring); CDKHydrogenAdder.getInstance(blr).addImplicitHydrogens(ring); boolean found = sqt.matches(ring);//true } regards, Nick ________________________________________ From: John May [john...@ebi.ac.uk] Sent: Friday, February 07, 2014 6:48 PM To: Nick Vandewiele Cc: cdk-user@lists.sourceforge.net Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder Okay now I’ve actually tracked it down - the issue is to do with aromaticity (kind of) and the SSSR providing a container for the ring atoms/bonds. With implicit hydrogens the substructure from the SSSRFinder looks like this… [CH]1CCCC1 Note that C1(O)CCCC1 is really [CH]1([OH])[CH2][CH2][CH2][CH2]1. In the CDK removing atoms doesn’t update neighbour hydrogen counts hence the first carbon keeps an implicit hydrogen count of 1. When all hydrogens are explicit we get [C]1[C][C][C][C]1 For some reason the aromaticity algorithm finds it to be aromatic. I can fix that but for now you can update the valences (i.e. AtomType/AddHydrogens) - but consider this. The atoms in the IRing are the same as the molecule - so adjusting the hydrogen count for the ring atoms would also affect the parent molecule. You can even run it and you’ll get.. [CH2]1([CH2](O[H])([CH2]([CH2]([CH2]1([H])[H])([H])[H])([H])[H])[H])([H])[H] It would be even worse when there are multiple rings. I’ve never liked IRing anyway - much better to refer to rings by index without creating a new container. Cheers, J On 7 Feb 2014, at 17:12, John May <john...@ebi.ac.uk<mailto:john...@ebi.ac.uk>> wrote: Doh - of course. So the SMARTS has a quirk that ‘C1’ matches the ‘CDKConstants.ISINRING’ flag. We can fix this without a patch - just add this before you match. The SMARTSQueryTool should be doing it already - not sure why it isn’t though…. (that’s the bug) SmartsMatchers.prepare(ring, true); On 7 Feb 2014, at 17:03, John May <john...@ebi.ac.uk<mailto:john...@ebi.ac.uk>> wrote: No problem, master, but nothing should have changed… J On 7 Feb 2014, at 16:47, Nick Vandewiele <nick.vandewi...@ugent.be<mailto:nick.vandewi...@ugent.be>> wrote: John, Thanks for the fast response! However: adding or removing dashes in the SMARTS string doesn’t change the outcome when I try it. Also, using your proposed alternative, eg: Pattern pattern = Ullmann.findSubstructure(SMARTSParser.parse(smarts, blr)); for (IAtomContainer ring : ringSet.atomContainers()) { System.out.println(pattern.matches(ring)); } Does not change the outcome (ie false) for me neither. Are you using the 1.5.4 or master branch? Regards, Nick From: John May [mailto:john...@ebi.ac.uk] Sent: Friday, February 07, 2014 5:28 PM To: Nick Vandewiele Cc: cdk-user@lists.sourceforge.net<mailto:cdk-user@lists.sourceforge.net> Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder Okay it’s the bond matching… C-1-C-C-C-C1 works but C1-C-C-C-C1 doesn’t. Should be an easy fix. J On 7 Feb 2014, at 16:03, Nick Vandewiele <nick.vandewi...@ugent.be<mailto:nick.vandewi...@ugent.be>> wrote: Hi, I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I didn’t quite understand. When I search for a SMARTS pattern in one of the rings detected using the SSSRFinder algorithm, the success of finding the pattern in the ring depends on whether implicit hydrogens were converted to explicit ones, or not. If explicit hydrogens are present, the pattern is not found. If only implicit hydrogens are present, the pattern IS found. This code was used: String smiles = "C1C(O)CCC1"; IChemObjectBuilder blr = SilentChemObjectBuilder.getInstance(); SmilesParser smipar = new SmilesParser(blr); IAtomContainer m = smipar.parseSmiles(smiles); String smarts = "C1-C-C-C-C1"; SMARTSQueryTool sqt = new SMARTSQueryTool(smarts, blr); AtomContainerManipulator.convertImplicitToExplicitHydrogens(m); IRingSet ringSet = new SSSRFinder(m).findSSSR();//find SSSR rings for(IAtomContainer ring : ringSet.atomContainers()){ boolean found = sqt.matches(ring);//false (should be true) } Although the release notes of 1.5.4 are very informative, I couldn’t find an answer explaining this behavior. So my question is two-fold: 1) how do I ensure that the pattern is found, even when explicit hydrogens are used in the atomcontainer? 2) What is happening underneath the hood here? Is this behavior normal? Regards, Nick ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net<mailto:Cdk-user@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/cdk-user ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net<mailto:Cdk-user@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/cdk-user ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user