Greg, Nils, Andrew, Thanks for all that info. Gives me plenty to work on!
Tim On 19/05/2017 09:27, Andrew Dalke wrote: > On May 19, 2017, at 08:33, Greg Landrum <greg.land...@gmail.com> wrote: >> The best solution to this is to use chemfp. It's a remarkable piece of >> software. > Thanks, Greg. > >> If you aren't willing to license that, the RDKit's search brute-force >> fingerprint search capabilities aren't too bad for in-memory fingerprints. > To clarify, chemfp 1.1 is available for no cost from chemfp.com or PyPI, > while later versions (chemfp 3.0 now supports Python 3) costs money. Both are > distributed under the MIT license. > > Within a month or so I'll be making a new release of the no-cost version. > I'll update the fingerprint type names, to reflect a change in the recent > RDKit release. > > On Thu, May 18, 2017 at 11:15 PM, Tim Dudgeon <tdudgeon...@gmail.com> wrote: >> I think I recall Greg mentioning that RDKit can be used for very fast >> similarity search (e.g. all vs. all comparisons or searches against >> multi-million sized datasets). > Greg's slides include a few timings numbers with single query searches. To > add some all vs. all numbers, chemfp takes about 40 minutes to cluster the > 1.58 million fingerprints from ChEMBL 21, at a threshold of 0.8 using 4 > threads and 2048 bit RDKit fingerprints. > > This functionality is available in the no-cost version. Depending on the > fingerprint size and your CPU type the newest version (3.0) is between 5% and > 35% faster than chemfp 1.1. > > Feel free to contact me if you have questions about chemfp. > > > > Andrew > da...@dalkescientific.com > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss