Hi everyone,

I have released chemfp 4.2. The new "simarray" functionality computes the full 
comparison matrix as a NumPy array, eg, for use in some clustering algorithms. 
It has built-in support for Tanimoto, Dice, cosine, and Hamming comparisons, 
plus an option to get the individual "a", "b", "c", and "d" components should 
you need a specialized metric. It processes roughly 100M comparisons per second 
on my laptop, which means if you had 30 TB of free disk space you could 
generate the NxN comparisons for ChEMBL in about a day. (I'm curious if someone 
will do this!)

Chemfp supports the CDK, RDKit, Open Babel, and OpenEye toolkits. Some of the 
specific improvements for the chemfp/CDK interface are:

- new "hydrogens" options for the SMILES and SDF readers ("as-is", 
"make-explicit", "make-implicit", and "make-nonchiral-implicit") to change 
between implicit and explicit hydrogens.

- added support for the CDK 2.9 Pubchem fingerprint improvements

- added support for jCompoundMapper fingerprints

The jCompoundMapper and "hydrogens" option were added after I read 
“Effectiveness of molecular fingerprints for exploring the chemical space of 
natural products” by Boldini, Ballabio, Consonni, Todeschini, Grisoni, and 
Sieber, J. Cheminform. (2024) 16:35 https://doi.org/10.1186/s13321-024-00830-3 
and realized there were a few rough edges chemfp could help smooth out.

For a full description of what's new in this release, see 
https://chemfp.com/docs/whats_new_in_42.html .

Chemfp may be the package you’ve been looking for, if you work with binary 
cheminformatics fingerprints. Chemfp is perhaps best known for its 
high-performance fingerprint similarity search. Its Taylor/Butina clustering, 
MaxMin diversity selection, and sphere exclusion, (including directed sphere 
exclusion) are equally world-class. Or, if you simply need a 100K by 100K 
distance array to pass into scikit-learn, chemfp’s simarray can generate that 
in less than a minute.

The chemfp homepage is https://chemfp.com/ . To install a pre-compiled chemfp 
for Linux-based OSes:

  python -m pip install chemfp -i https://chemfp.com/packages/

The default installation limits or disables a few chemfp features as described 
in the base license agreement at https://chemfp.com/BaseLicense.txt . To 
request a license key, which is free for academic use, see 
https://chemfp.com/license/ .

Best regards,

                                Andrew Dalke
                                da...@dalkescientific.com



_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to