Andrew, On Mon, Feb 9, 2009 at 11:26 AM, Andrew Dalke <[email protected]> wrote: > Greg: >> I must admit that I find the use of branched paths somehow more >> pleasing. If you don't include either branching or more detail about >> atom identity in the hashing, then it seems like you'd get 100% >> similarity between CCC and CC(C)C. > > I quite agree. I'm looking at this for substructure filtering, > and it feels like the additional topology information should > be better. Though the code is a bit more complex. With linear > branching did a hard-coded set of for-loops so I wouldn't need > to use recursion or need a dynamic data structure.
For substructure filtering, it might be worth taking a look at the (newish) "layered fingerprints", also in Fingerprints.h. Those were introduced with the idea of providing a more efficient (and potentially better) fingerprint for this purpose. Things worked well in some preliminary testing, but they need a bit more validation. >> Of course, I doubt that generating >> these fingerprints normally lies on the critical path. > > Though of course while I'm thinking about that for speed reasons, > as you say, that's not on the critical path. If you do find yourself cursing the speed of the fingerprint generation, it might be worth taking a look at using the alternate RNG that is applied in the layered fingerprint code (line 229 of Fingerprints.cpp). Some profiling I did while implementing those fps showed that I was spending a disproportionate (and unecessary) amount of time in the RNG seeding process. The adjusted params for the layered fingerprint RNG seemed to solve that problem. -greg

