Hi Andrew, *Default vs Silent*
>From an end user perspective there is little difference between Default and Silent. Silent is what you want and in CDK v3.0 it will become the new default/standard, silent used to be called NoNotify. Internally Default allows you to add listeners which will be notified for every update to the molecule and or its atoms - this was useful for JChemPaint I believe (but actually it doesn't use it any more). Even if you don't add any listeners there is an overhead of dispatching the edit events so it is better to avoid this. *Molecule Standard Form* We (CDK) try to impose very little automation/sanitisation by default, rather than Daylight's dt_mod on/off and RDKit's sanitization it is more similar to OEChem in that the molecule comes out of the readers as they were described in the input. We go a little further and don't even do ring perception (is in ring: true/false). Most common formats (SMILES/MOLfile/InChI/CML) will set the hydrogen counts for you but some older formats (PDB/XYZ) will not. Since standard SMARTS has expressions which require ring flags (true/false) and aromaticity to function correctly we err on the side of caution and will do these automatically unless asked not to (the molecule is prepared for matching). For a single pattern it is a bit smarter and will inspect the expressions and work out if ring flags or aromaticity is needed - something like *[#6]~Cl* does not need these prepared for example. If you have a whole bunch of patterns to run this is obviously inefficient so it is better to prepare the molecule once and then match each pattern. That is where the static function SmartsPattern.prepare() comes in - it is just a convenience utility which does ring finding + Daylight aromaticity. The SmartsPattern <https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/smarts/SmartsPattern.html> is the higher level API and why it does these things automatically you can also load a SMARTS pattern and use a normal substructure matcher. *A Pattern <https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/isomorphism/Pattern.html> for matching a single SMARTS query against multiple target compounds. The class can be used for efficiently matching many queries against a single target if setPrepare(boolean) <https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/smarts/SmartsPattern.html#setPrepare(boolean)> is disabled (prepare(IAtomContainer) <https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/smarts/SmartsPattern.html#prepare(org.openscience.cdk.interfaces.IAtomContainer)>) should be called manually once for each molecule.* I will update this documentation to make it more explicit when it is/isn't needed and why it is done. Here is an example of using the lower level APIs which require manually preparing for the pattern. IAtomContainer query = SilentChemObjectBuilder.getInstance().newAtomContainer(); if (!Smarts.parse(query, "C=C-C=N")) { // bad pattern } Pattern pat = Pattern.findSubstructure(query); IAtomContainer mol = ...; SmartsPattern.prepare(mol); if (pat.matches(mol)) { // is a match } *Standard Workflow* If you have multiple patterns to match what you want to do is something like this: 0. patterns <- load SMARTS/prepare patterns, set prepare false 1. Read Molecule (mol) 2. Set ring flags 3. Set aromaticity 4. for pat in patterns: pat.match(mol) Steps 2/3 can be replaced with prepare, if you have pre-calculated and store aromaticity (e.g. in SMILES) then you can skip step 3 as the input aromaticity flags will be preserved. > I'm not sure how you got that output: Because I was confused when I wrote the code in the first place? Sorry I meant if you knew the steps to reproduce/which aromaticity model did you use..? The standard Daylight model used by the SMARTS matcher would find the externeral porphyrin ring aromatic hence I'm not sure how you would get that unless you used a different aromaticity model (e.g. tighter ring set) before writing to SMILES. Hopefully that covers everything but let me know if you have any more questions/thoughts. It's always a tough balance between doing too much/little automatically and in this case we want simple inputs (e.g. kekulé benzene) to be handled correctly by novice users - the side effect is that there are obviously molecules where the aromaticity is in debate/opinion and it can be confusing since the input in your case wasn't the same at was actually matched on. Fortunately these are relatively rare. P.S. I am considering moving the ring flag setting to the IO readers for CDK v3.0 which is more akin to what OEChem doesn - this is only now possible since it's much faster than it used to be. Best, John On Tue, 24 Jun 2025 at 22:56, Andrew Dalke <da...@dalkescientific.com> wrote: > Thank you John and Jonas for your answers. > > One big issue is I still don't have a good grasp of how CDK does things. > The second is that I'm doing it through Python and the Pype bridge. > > The third is that I last looked at this part of the code over a year ago, > and wrote most of the code about 4 years ago. > > > On Jun 24, 2025, at 17:34, John Mayfield <john.wilkinson...@gmail.com> > wrote: > > First off for the SMARTS matcher you can turn off the "prepare" or use > the lower level APIs and work on the input aromaticity. > > > > IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance(); > > Is there a reason for using SilentChemObjectBuilder instead of what I use, > which is: > > cdk.DefaultChemObjectBuilder.getInstance() > > ? I see cinfony does what you suggest, as does Jonas: > > > SmartsPattern pat = SmartsPattern.create("C=CC=N"); > > pat.setPrepare(false); // turn off auto ring+arom perception > > Jonas also mentioned the prepare method: > > //prevent the SMARTS pattern from perceiving aromaticity > > pattern.setPrepare(false); > > I've never used this method. > > With the default of true, does each SMARTS match re-perceive aromaticity > each time? > > > John: > > Cycles.markRingAtomsAndBonds(mol); > > Aromaticity.apply(Aromaticity.Model.Daylight, mol); > > Hmmm. It looks like I don't understand who is supposed to be in charge of > doing perception, or what the processing steps to get a fully prepared > structure. > > What I've been doing is using SmilesParser(_default_builder).parseSmiles() > and assuming the molecule was in the right state. > > I then use one of the fingerprinters, or do the SMARTS matches for a > couple of my own fingerprint types. > > Am I always supposed to perceive rings and aromaticity if I use > SmilesParser? Is there any reason to not use the same aromcity perception > steps in CDK Depict, using Daylight aromaticity? > > What about if I use MDLV2000Reader/MDLV3000Reader? Or IteratingSDFReader > or IteratingSMILESReader with hasNext()/next() to get the molecules? Do I > need to perceive those too? > > Also, I'm looking at SubstructureFingerprinter.java and see: > > SmartsPattern.prepare(atomContainer) > > Do I need this too? Jonas wrote "SmartPattern.matchAll() is called in the > web app, which internally calls SmartsPattern.prepare", so I don't think I > need it. > > John: > > I'm not sure how you got that output: > > Because I was confused when I wrote the code in the first place? > > I can spend some time pulling the CDK-specific code out of chemfp to get a > stand-alone reproducible, but it's probably a better use of my time to just > get the processing steps done correctly. > > Andrew > da...@dalkescientific.com > > > > > > _______________________________________________ > Cdk-user mailing list > Cdk-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/cdk-user >
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user