Following up on myself,

On Jun 6, 2017, at 04:00, Andrew Dalke <da...@dalkescientific.com> wrote:
> I've fleshed out that algorithm so it's a command-line program that can be 
> used for benchmarking purposes. It's available from 
> http://dalkescientific.com/writings/taylor_butina.py .
> 
> If anyone uses it for benchmarking, or has improvements, let me know. If I 
> get useful feedback about it, I'll include it in an upcoming chemfp 1.3 
> release.

Based on the discussion, I've decided to add a way to convert a chemfp 
SearchResults into a scipy sparse row matrix. This would be part of the 
upcoming 1.3 release (the no-cost version) and in 3.1 (the commercial version).

I need feedback because I have no experience with scipy or the clustering tools 
in scikit-learn. I have put a prototype version of the code, which works with 
chemfp-1.1, at

  http://dalkescientific.com/chemfp_to_scipy_csr.py 

In theory (see previous disclaimer), when run as a command-line tool it will 
use DBSCAN to cluster the specified fingerprint file.

Here's the command-line --help:

usage: chemfp_to_scipy_csr.py [-h] [-t FLOAT] [--eps FLOAT]
                              [--min-samples INT] [--num-jobs INT]
                              FILENAME

test prototype adapter between chemfp and scipy.cluster using DBSCAN

positional arguments:
  FILENAME

optional arguments:
  -h, --help            show this help message and exit
  -t FLOAT, --threshold FLOAT
                        minimum similarity threshold (default: 0.8)
  --eps FLOAT           The maximum distance between two samples for them to
                        be considered as in the same neighborhood. (default:
                        0.1)
  --min-samples INT     The number of samples (or total weight) in a
                        neighborhood for a point to be considered as a core
                        point. This includes the point itself. (default: 5)
  --num-jobs INT, -j INT
                        The number of parallel jobs to run. If -1, then the
                        number of jobs is set to the number of CPU cores.
                        (default: 1)

This is off-topic for the RDKit list so please follow up with me via email.


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to