Re: [Rdkit-discuss] Some basic questions about binary fingerprints

2021-01-09 Thread Greg Landrum
Hi Jan, I did a long(ish) post on collisions and fingerprints a while ago that I think might be helpful here: https://sourceforge.net/p/rdkit/mailman/message/36438523/ I won't repost the whole thing here, but maybe you can take a look to see if it helps explain what you're observing (and why it

Re: [Rdkit-discuss] Some basic questions about binary fingerprints

2021-01-09 Thread Nils Weskamp
Dear Jan, you are probably right. If you have about 2/3 of your 10k bits set to one, doesn't that imply the probability of a collision for any new fragment is roughly 2/3 (which fits to the 5 of 7 you observe in your example)? Concerning your second question: Just as any other descriptor,

[Rdkit-discuss] Some basic questions about binary fingerprints

2021-01-09 Thread Jan Halborg Jensen
I am trying to relate the reliability of ML models trained using binary fingerprint to the presence of on-bits, i.e. comparing the on-bits in a molecule in the test set to the on-bits in the training set. But I am getting some strange results The code is here so I will just summarise.