Hello everyone,

Thank you for all your helpful suggestions.

I've taken careful note of them, and they have been extremely helpful in
guiding my work.
3D-QSAR is also new for me and your insights and expertise have been
incredibly valuable.

Thank you once again for your generous assistance.

Best Regards,

Ariadna Llop

Missatge de Andrew Dalke <da...@dalkescientific.com> del dia dt., 30 d’abr.
2024 a les 22:45:

> Hi Ariadna,
>
>   In general the MACCS keys are not that good for comparing similarity.
> They exist still for historical reasons. Back in the 1970s the company
> Molecular Design Limited developed a program called "Molecular Access
> System" (MACCS) for structure registration, substructure search, and the
> like.
>
> Substructure search is slow, so MACCS includes a set of keys which would
> act as fast filters - if the query contained a key but the database entry
> did not, then the query could not match that entry.
>
> In the 1980s when fingerprint similarity search first became popular -
> this is before the term "fingerprint" was even coined - people used the
> MACCS keys because they were already computed and sitting there, on the
> computer system they were already using.
>
> Over time people developed other types of fingerprints, and different ways
> to compare them, and a more complete understanding of how they are coupled
> to the types of system being studied.
>
> For example, in "Comparing structural fingerprints using a
> literature-based similarity benchmark" by Sayle and O'Boyle,
> "Extended-connectivity fingerprints of diameter 4 and 6 are among the best
> performing fingerprints when ranking diverse structures by similarity, as
> is the topological torsion fingerprint. However, when ranking very close
> analogues, the atom pair fingerprint outperforms the others tested."
>
> They found the MACCS fingerprints to be one of the worst performers, which
> you might expect now that you know the happenstance which made them popular.
>
> Since you are doing 3D QSAR, you should familiarize yourself with the
> fingerprints used in that area. I have no experience with 3D QSAR and
> cannot provide advice on what is appropriate.
>
> The first paper I found using Google Scholar to search for "3d qsar
> fingerprints" is "Docking, Interaction Fingerprint, and Three-Dimensional
> Quantitative Structure–Activity Relationship (3D-QSAR) of Sigma1 Receptor
> Ligands, Analogs of the Neuroprotective Agent RC-33" at
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637851/ which uses
> Interaction fingerprints.
>
> The second is "Novel TOPP descriptors in 3D-QSAR analysis of apoptosis
> inducing 4-aryl-4H-chromenes: Comparison versus other 2D- and
> 3D-descriptors" at
> https://www.sciencedirect.com/science/article/pii/S0968089607005834 which
> I mention to because it summarizes 7 different descriptor-based approaches,
> and places the MACCS keys in last place, far below the second worst ("TOPP
> > GRIND > BCI 4096 = ECFP > FCFP > GRID-GOLPE ≫ DRAGON ⋙ MDL 166").
>
> No doubt there are many others for you to read through and try out.
>
>
> > # Generate fingerprint descriptor database
> > fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]
>
> What I can suggest is you try my chemfp package, specifically the 3.2b1 I
> just released (bear in mind that it is beta!)
>
> You can install it with:
>
>    python -m pip install chemfp==4.2b1 -i https://chemfp.com/packages/
>
> To generate Morgan fingerprints of radius 2, I suggest you compute them
> once and store them in a file, like this command-line example:
>
>   rdkit2fps --morgan2 dataset.smi -o dataset.fps
>
> (use "--maccs" to generate MACCS keys, "--pair" for atom pairs; and use
> "--help" to see what other options are available.)
>
> To "Calculate pairwise Tanimoto similarity between fingerprints" as a
> distance, you can use another command-line tool to generate the matrix as a
> NumPy "npy" file, like this:
>
>   chemfp simarray dataset.fps --as-distance -o dataset.npy
>
> To load this in Python:
>
>   import numpy as np
>   dists = np.load("dataset.npy")
>
> If you also need the identifiers:
>
>   with open("dataset.npy", "rb") as f:
>     dists = np.load(f)
>     metadata = np.load(f)
>     ids = np.load(f)
>
> This should make it easier to iterate over the different clustering
> methods available, since you only generate the fingerprints and distance
> matrix once.
>
> If you decide to use interaction fingerprints, or some other fingerprint
> type that is not in the RDKit, you can still generate the fingerprints in
> FPS format (a simple text format) and use chemfp to generate your matrix
> for you, either on the command-line or through its Python API.
>
> > However, I'm not satisfied with the results and would like to experiment
> with MACCS Keys to see if they yield better clustering outcomes. Does
> anyone know how to cluster compounds using MACCS fingerprints? Any insights
> on the best approach to calculate similarities and cluster using these
> fingerprints would be highly appreciated.
>
> In case I was not clear enough before, MACCS keys make poor fingerprints.
> There is no reason to expect they will yield better clustering outcomes,
> and multiple papers which suggest they will make worse outcomes.
>
> Best regards,
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to