> On Mar 13, 2021, at 20:29, Marawan Hussien via Rdkit-discuss 
> <rdkit-discuss@lists.sourceforge.net> wrote:
> my question is if this is the valid approach of comparison, particularly if 
> the class sizes vary widely and the average similarity will be inevitably 
> affected by the size of each item in each pair. As a check, it looks that the 
> diagonal is having the highest inter-classes similarity overall, which is 
> anyway expected.
> 
> I am also wondering if a size-weighted normalization approach could handle 
> this situation?

What about a Z-score? That is:

    zscore = (score - background_score) / background_standard_deviation

rather than using the mean score.

I worked out something like that a few years ago, using chemfp, at 
http://www.dalkescientific.com/writings/diary/archive/2017/03/27/chembl_target_sets_association_network.html
 .

If that's a reasonable approach, then it could all be done in RDKit, if you 
don't want to use chemfp.

Best regards,


                                Andrew
                                da...@dalkescientific.com




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to