Here some additional comments:

I now changed the "reading code" to :

MDLV2000Reader molReader = new MDLV2000Reader(stream);
Molecule mol = (Molecule) molReader.read((ChemObject) new Molecule());
AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(mol);
CDKHueckelAromaticityDetector.detectAromaticity(mol);

Now I get the same amount of results regardless of Fingerprinter or 
ExtendedFingerprinter with my arbitrary query.

But there still is a difference when just using benzene as query, meaning 
normal fingerprinter does not seem to be usable if your dataset has aromatic 
compounds.
(18k hits compared to 38k with Extended, commercial software gets about 100 
more than with extended).

> The correct number of results is obtained by doing a subgraph
> isomorphism directly without any intervening fingerprint screen

Will try it out to see what I get then.

> Do you have any profiling results? Keeping lots of IAtomContainer
> objects in memory can lead to high memory consumption - these objects
> are pretty heavyweight

I try to limit it as possible like when creating fingerprints only reading them 
in smaller batches and not all of them (because certain JDBC drivers like for 
hsqldb return all results at once ignoring fetchSize. But sure there can be 
several thousands in memory at once.


Regards,

Thomas
                                          
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to