Re: [Rdkit-discuss] chemical database management software

2022-05-01 Thread Zhenting Gao via Rdkit-discuss
Datawarrior is my favorite. ---Original--- From: "Marawan Hussien via Rdkit-discuss"___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] chemical database management software

2022-05-01 Thread Marawan Hussien via Rdkit-discuss
Hello.I am wondering if someone familiar with chemical database management software (open source preferred) to keep records of chemical structures, assay records as well as other metadata. Ideally with some SAR capabilities. I am looking around and only a few solutions exist, and the propriety

Re: [Rdkit-discuss] Clustering

2022-05-01 Thread Patrick Walters
Similarity search on a database of 4 million is pretty quick with ChemFp or fpsim2. Do you need to do the clustering? Here are a couple of relevant blog posts. http://practicalcheminformatics.blogspot.com/2020/10/what-do-molecules-that-look-like-this.html

Re: [Rdkit-discuss] Clustering

2022-05-01 Thread Tristan Camilleri
Thank you both for the feedback. My primary aim is to run an LBVS experiment (similarity search) using a set of actives and the dataset of cluster representatives. On Sun, 1 May 2022, 17:09 Patrick Walters, wrote: > For me, a lot of this depends on what you intend to do with the >

Re: [Rdkit-discuss] Clustering

2022-05-01 Thread Rajarshi Guha
You could consider using FAISS. An example of clustering 2.1M cmpds is described at http://practicalcheminformatics.blogspot.com/2019/04/clustering-21-million-compounds-for-5.html On Sun, May 1, 2022 at 9:23 AM Tristan Camilleri < tristan.camilleri...@um.edu.mt> wrote: > Hi, > > I am attempting

Re: [Rdkit-discuss] chemical database management software

2022-05-01 Thread John Clos
I use Advanced Chemistry Development software, which is commercial, but I've always been very pleased with their support. It has the added benefit that you can view raw analytical data (NMR, LC-MS, etc.) from most any vendor and combine structures with the raw data. John Classified -

[Rdkit-discuss] Clustering

2022-05-01 Thread Tristan Camilleri
Hi, I am attempting to cluster a database of circa 4M small molecules and I have hit several snags. Using BulkTanimoto is not possible due to resiurces that are required. I am now working with fpsim2 and chemfp to get a distance matrix (sparse matrix). However, I am finding it very challenging to

Re: [Rdkit-discuss] Clustering

2022-05-01 Thread Patrick Walters
For me, a lot of this depends on what you intend to do with the clustering. If you want to pick a "representative" subset from a larger dataset, k-means may do the trick. As Rajarshi mentioned, Practical Cheminformatics has a k-means implementation that runs with FAISS. Depending on your goal,

Re: [Rdkit-discuss] Clustering

2022-05-01 Thread Tristan Camilleri
Thanks for the feedback. Rather than an explicit need to perform clustering, it is more for me to learn how to do it. Any pointers to this effect would be greatly appreciated. Tristan On Sun, 1 May 2022 at 18:18, Patrick Walters wrote: > Similarity search on a database of 4 million is pretty