On Fri, Apr 20, 2012 at 10:13 AM, Gonzalo Colmenarejo-Sanchez
<gonzalo.2.colmenar...@gsk.com> wrote:
>
> I have performed a similarity matrix calculation of 4176 X 4016 molecules
> with a program using the RDKit and it took 401 seconds. The same program
> with the same sets of molecules and using the Daylight toolkit took 19
> seconds.
>
>
>
> Has anybody observed similar results? The main difference in time seems to
> come from the Tanimoto similarity calculation (although the fingerprint
> generation is also slower). I’m concerned about the impact in e.g.
> clustering algorithms with large datasets.

As Andrew already pointed out: the RDKit tanimoto similarity
calculation is not super fast, his chemfp package is a good place to
look if you really need high-performance similarity calculations.

One of the things on my ToDo list is to revisit the fingerprint
representation in the RDKit with an eye to speeding up similarity (and
other) calculations. This will make a difference, probably a big one,
but it's not going to reach the level of performance that Andrew has
achieved in his package.

-greg

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to