Re: [Rdkit-discuss] Clustering

2022-05-01 Thread Tristan Camilleri
0:46 AM Rajarshi Guha > wrote: > >> You could consider using FAISS. An example of clustering 2.1M cmpds is >> described at >> http://practicalcheminformatics.blogspot.com/2019/04/clustering-21-million-compounds-for-5.html >> >> >> On Sun, May 1, 2022

[Rdkit-discuss] Clustering

2022-05-01 Thread Tristan Camilleri
Hi, I am attempting to cluster a database of circa 4M small molecules and I have hit several snags. Using BulkTanimoto is not possible due to resiurces that are required. I am now working with fpsim2 and chemfp to get a distance matrix (sparse matrix). However, I am finding it very challenging to

Re: [Rdkit-discuss] Clustering

2022-05-01 Thread Tristan Camilleri
/09/similarity-search-and-some-cool-pandas.html > > Pat > > > > On Sun, May 1, 2022 at 12:12 PM Tristan Camilleri < > tristan.camilleri...@um.edu.mt> wrote: > >> Thank you both for the feedback. >> >> My primary aim is to run an LBVS exper

Re: [Rdkit-discuss] Clustering

2022-05-02 Thread Tristan Camilleri
a dense (4M)^2 matrix, which I assume you cannot make or > store. > > From this (which is essentially an *adjacency* matrix - telling you which > molecules are or are not ‘linked’) you can do graph representations and > even clustering, e.g. using igraph (again, I can only suggest an R