Greg:
I must admit that I find the use of branched paths somehow more
pleasing. If you don't include either branching or more detail about
atom identity in the hashing, then it seems like you'd get 100%
similarity between CCC and CC(C)C.
I quite agree. I'm looking at this for substructure filtering,
and it feels like the additional topology information should
be better. Though the code is a bit more complex. With linear
branching did a hard-coded set of for-loops so I wouldn't need
to use recursion or need a dynamic data structure.
Of course, I doubt that generating
these fingerprints normally lies on the critical path.
Though of course while I'm thinking about that for speed reasons,
as you say, that's not on the critical path.
Andrew
[email protected]