[Rdkit-discuss] question about fingerprint generation

Andrew Dalke Mon, 09 Feb 2009 00:54:29 +0000

Hello,

I'm trying to understand RDKit's hash fingerprint code. Thefunctions in Fingerprints.cpp callSubgraphs.cpp:findAllSubgraphsOfLengthsMtoN . The code in Subgraphsmakes a distinction between a subgraph (which can have branches) andpaths (which are linear).

The implementation for Subgraphs.cpp:findAllSubgraphsOfLengthsMtoNappears to do what it says - return subgraphs - and the comment

- to make few things clear it might be useful to typdef a"subgraphListType"even if it is exactly same as the "pathListType", just to notconfuse between
      path vs. subgraph definitions

reinforces that. The code then uses Balaban's J index for thesubgraph to seed the PRNG to generate the fingerprint bits.

This surprises me because I thought the Daylight approach was to onlyconsider linear fragments. Implementations like in


OpenBabel - finger2.cpp:1077
/// \brief Fingerprint based on linear fragments up to 7 atoms ID="FP2"

the CDK - PathTools.java:552-553

* This method returns a set of paths. Each path is a<code>List</code> of atoms that

     * make up the path (ie they are sequentially connected).

and a private code base I have access to only use linear paths.

But upon closer reading of
  http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
I see that it doesn't say that paths are linear, only text like

  "atoms and bonds connected by paths up to 3 bonds long"

I admit though that I'm a bit surprised. I thought I know how thissort of code works.

Hmm, I haven't asked a question yet.... Greg? How did RDKit manage tobe the only one I found which uses branching in the fingerprintgeneration code? Or do I have things wrong?


                                Andrew
                                [email protected]

[Rdkit-discuss] question about fingerprint generation

Reply via email to