Following up on myself, On Jun 6, 2017, at 04:00, Andrew Dalke <da...@dalkescientific.com> wrote: > I've fleshed out that algorithm so it's a command-line program that can be > used for benchmarking purposes. It's available from > http://dalkescientific.com/writings/taylor_butina.py . > > If anyone uses it for benchmarking, or has improvements, let me know. If I > get useful feedback about it, I'll include it in an upcoming chemfp 1.3 > release.
Based on the discussion, I've decided to add a way to convert a chemfp SearchResults into a scipy sparse row matrix. This would be part of the upcoming 1.3 release (the no-cost version) and in 3.1 (the commercial version). I need feedback because I have no experience with scipy or the clustering tools in scikit-learn. I have put a prototype version of the code, which works with chemfp-1.1, at http://dalkescientific.com/chemfp_to_scipy_csr.py In theory (see previous disclaimer), when run as a command-line tool it will use DBSCAN to cluster the specified fingerprint file. Here's the command-line --help: usage: chemfp_to_scipy_csr.py [-h] [-t FLOAT] [--eps FLOAT] [--min-samples INT] [--num-jobs INT] FILENAME test prototype adapter between chemfp and scipy.cluster using DBSCAN positional arguments: FILENAME optional arguments: -h, --help show this help message and exit -t FLOAT, --threshold FLOAT minimum similarity threshold (default: 0.8) --eps FLOAT The maximum distance between two samples for them to be considered as in the same neighborhood. (default: 0.1) --min-samples INT The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. (default: 5) --num-jobs INT, -j INT The number of parallel jobs to run. If -1, then the number of jobs is set to the number of CPU cores. (default: 1) This is off-topic for the RDKit list so please follow up with me via email. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss