Hi everyone, I have released chemfp 4.2. The new "simarray" functionality computes the full comparison matrix as a NumPy array, eg, for use in some clustering algorithms. It has built-in support for Tanimoto, Dice, cosine, and Hamming comparisons, plus an option to get the individual "a", "b", "c", and "d" components should you need a specialized metric. It processes roughly 100M comparisons per second on my laptop, which means if you had 30 TB of free disk space you could generate the NxN comparisons for ChEMBL in about a day. (I'm curious if someone will do this!)
Chemfp supports the CDK, RDKit, Open Babel, and OpenEye toolkits. Some of the specific improvements for the chemfp/CDK interface are: - new "hydrogens" options for the SMILES and SDF readers ("as-is", "make-explicit", "make-implicit", and "make-nonchiral-implicit") to change between implicit and explicit hydrogens. - added support for the CDK 2.9 Pubchem fingerprint improvements - added support for jCompoundMapper fingerprints The jCompoundMapper and "hydrogens" option were added after I read “Effectiveness of molecular fingerprints for exploring the chemical space of natural products” by Boldini, Ballabio, Consonni, Todeschini, Grisoni, and Sieber, J. Cheminform. (2024) 16:35 https://doi.org/10.1186/s13321-024-00830-3 and realized there were a few rough edges chemfp could help smooth out. For a full description of what's new in this release, see https://chemfp.com/docs/whats_new_in_42.html . Chemfp may be the package you’ve been looking for, if you work with binary cheminformatics fingerprints. Chemfp is perhaps best known for its high-performance fingerprint similarity search. Its Taylor/Butina clustering, MaxMin diversity selection, and sphere exclusion, (including directed sphere exclusion) are equally world-class. Or, if you simply need a 100K by 100K distance array to pass into scikit-learn, chemfp’s simarray can generate that in less than a minute. The chemfp homepage is https://chemfp.com/ . To install a pre-compiled chemfp for Linux-based OSes: python -m pip install chemfp -i https://chemfp.com/packages/ The default installation limits or disables a few chemfp features as described in the base license agreement at https://chemfp.com/BaseLicense.txt . To request a license key, which is free for academic use, see https://chemfp.com/license/ . Best regards, Andrew Dalke da...@dalkescientific.com _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user