Hi Andrew,

*Default vs Silent*

>From an end user perspective there is little difference between Default and
Silent. Silent is what you want and in CDK v3.0 it will become the new
default/standard, silent used to be called NoNotify. Internally Default
allows you to add listeners which will be notified for every update to the
molecule and or its atoms - this was useful for JChemPaint I believe (but
actually it doesn't use it any more). Even if you don't add any
listeners there is an overhead of dispatching the edit events so it is
better to avoid this.

*Molecule Standard Form*

We (CDK) try to impose very little automation/sanitisation by default,
rather than Daylight's dt_mod on/off and RDKit's sanitization it is more
similar to OEChem in that the molecule comes out of the readers as they
were described in the input. We go a little further and don't even do ring
perception (is in ring: true/false). Most common formats
(SMILES/MOLfile/InChI/CML) will set the hydrogen counts for you but some
older formats (PDB/XYZ) will not.

Since standard SMARTS has expressions which require ring flags (true/false)
and aromaticity to function correctly we err on the side of caution and
will do these automatically unless asked not to (the molecule is prepared
for matching). For a single pattern it is a bit smarter and will inspect
the expressions and work out if ring flags or aromaticity is needed -
something like *[#6]~Cl* does not need these prepared for example. If you
have a whole bunch of patterns to run this is obviously inefficient so it
is better to prepare the molecule once and then match each pattern. That is
where the static function SmartsPattern.prepare() comes in - it is just a
convenience utility which does ring finding + Daylight aromaticity.

The SmartsPattern
<https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/smarts/SmartsPattern.html>
is the higher level API and why it does these things automatically you can
also load a SMARTS pattern and use a normal substructure matcher.

*A Pattern
<https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/isomorphism/Pattern.html>
for
matching a single SMARTS query against multiple target compounds. The class
can be used for efficiently matching many queries against a single target
if setPrepare(boolean)
<https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/smarts/SmartsPattern.html#setPrepare(boolean)>
is
disabled (prepare(IAtomContainer)
<https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/smarts/SmartsPattern.html#prepare(org.openscience.cdk.interfaces.IAtomContainer)>)
should be called manually once for each molecule.*


I will update this documentation to make it more explicit when it is/isn't
needed and why it is done.

Here is an example of using the lower level APIs which require manually
preparing for the pattern.

IAtomContainer query = SilentChemObjectBuilder.getInstance().newAtomContainer();
if (!Smarts.parse(query, "C=C-C=N")) {
    // bad pattern
}
Pattern pat = Pattern.findSubstructure(query);

IAtomContainer mol = ...;
SmartsPattern.prepare(mol);
if (pat.matches(mol)) {
    // is a match
}

*Standard Workflow*

If you have multiple patterns to match what you want to do is something
like this:

0. patterns <- load SMARTS/prepare patterns, set prepare false
1. Read Molecule (mol)
2. Set ring flags
3. Set aromaticity
4. for pat in patterns: pat.match(mol)

Steps 2/3 can be replaced with prepare, if you have pre-calculated and
store aromaticity (e.g. in SMILES) then you can skip step 3 as the input
aromaticity flags will be preserved.

> I'm not sure how you got that output:

Because I was confused when I wrote the code in the first place?


Sorry I meant if you knew the steps to reproduce/which aromaticity model
did you use..? The standard Daylight model used by the SMARTS matcher would
find the externeral porphyrin ring aromatic hence I'm not sure how you
would get that unless you used a different aromaticity model (e.g. tighter
ring set) before writing to SMILES.

Hopefully that covers everything but let me know if you have any more
questions/thoughts. It's always a tough balance between doing too
much/little automatically and in this case we want simple inputs (e.g.
kekulé benzene) to be handled correctly by novice users - the side effect
is that there are obviously molecules where the aromaticity is in
debate/opinion and it can be confusing since the input in your case wasn't
the same at was actually matched on. Fortunately these are relatively rare.

P.S. I am considering moving the ring flag setting to the IO readers for
CDK v3.0 which is more akin to what OEChem doesn - this is only now
possible since it's much faster than it used to be.

Best,
John

On Tue, 24 Jun 2025 at 22:56, Andrew Dalke <da...@dalkescientific.com>
wrote:

> Thank you John and Jonas for your answers.
>
> One big issue is I still don't have a good grasp of how CDK does things.
> The second is that I'm doing it through Python and the Pype bridge.
>
> The third is that I last looked at this part of the code over a year ago,
> and wrote most of the code about 4 years ago.
>
> > On Jun 24, 2025, at 17:34, John Mayfield <john.wilkinson...@gmail.com>
> wrote:
> > First off for the SMARTS matcher you can turn off the "prepare" or use
> the lower level APIs and work on the input aromaticity.
> >
> > IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
>
> Is there a reason for using SilentChemObjectBuilder instead of what I use,
> which is:
>
>   cdk.DefaultChemObjectBuilder.getInstance()
>
> ? I see cinfony does what you suggest, as does Jonas:
>
> > SmartsPattern pat = SmartsPattern.create("C=CC=N");
> > pat.setPrepare(false); // turn off auto ring+arom perception
>
> Jonas also mentioned the prepare method:
> > //prevent the SMARTS pattern from perceiving aromaticity
> > pattern.setPrepare(false);
>
> I've never used this method.
>
> With the default of true, does each SMARTS match re-perceive aromaticity
> each time?
>
>
> John:
> > Cycles.markRingAtomsAndBonds(mol);
> > Aromaticity.apply(Aromaticity.Model.Daylight, mol);
>
> Hmmm. It looks like I don't understand who is supposed to be in charge of
> doing perception, or what the processing steps to get a fully prepared
> structure.
>
> What I've been doing is using SmilesParser(_default_builder).parseSmiles()
> and assuming the molecule was in the right state.
>
> I then use one of the fingerprinters, or do the SMARTS matches for a
> couple of my own fingerprint types.
>
> Am I always supposed to perceive rings and aromaticity if I use
> SmilesParser? Is there any reason to not use the same aromcity perception
> steps in CDK Depict, using Daylight aromaticity?
>
> What about if I use MDLV2000Reader/MDLV3000Reader? Or IteratingSDFReader
> or IteratingSMILESReader with hasNext()/next() to get the molecules? Do I
> need to perceive those too?
>
> Also, I'm looking at SubstructureFingerprinter.java and see:
>
>   SmartsPattern.prepare(atomContainer)
>
> Do I need this too? Jonas wrote "SmartPattern.matchAll() is called in the
> web app, which internally calls SmartsPattern.prepare", so I don't think I
> need it.
>
> John:
> > I'm not sure how you got that output:
>
> Because I was confused when I wrote the code in the first place?
>
> I can spend some time pulling the CDK-specific code out of chemfp to get a
> stand-alone reproducible, but it's probably a better use of my time to just
> get the processing steps done correctly.
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
>
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to