Here some additional comments:
I now changed the "reading code" to :
MDLV2000Reader molReader = new MDLV2000Reader(stream);
Molecule mol = (Molecule) molReader.read((ChemObject) new Molecule());
AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(mol);
CDKHueckelAromaticityDetector.detectAromaticity(mol);
Now I get the same amount of results regardless of Fingerprinter or
ExtendedFingerprinter with my arbitrary query.
But there still is a difference when just using benzene as query, meaning
normal fingerprinter does not seem to be usable if your dataset has aromatic
compounds.
(18k hits compared to 38k with Extended, commercial software gets about 100
more than with extended).
> The correct number of results is obtained by doing a subgraph
> isomorphism directly without any intervening fingerprint screen
Will try it out to see what I get then.
> Do you have any profiling results? Keeping lots of IAtomContainer
> objects in memory can lead to high memory consumption - these objects
> are pretty heavyweight
I try to limit it as possible like when creating fingerprints only reading them
in smaller batches and not all of them (because certain JDBC drivers like for
hsqldb return all results at once ignoring fetchSize. But sure there can be
several thousands in memory at once.
Regards,
Thomas
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user