Hi Jan,
I did a long(ish) post on collisions and fingerprints a while ago that I
think might be helpful here:
https://sourceforge.net/p/rdkit/mailman/message/36438523/
I won't repost the whole thing here, but maybe you can take a look to see
if it helps explain what you're observing (and why it
Dear Jan,
you are probably right. If you have about 2/3 of your 10k bits set to
one, doesn't that imply the probability of a collision for any new
fragment is roughly 2/3 (which fits to the 5 of 7 you observe in your
example)?
Concerning your second question: Just as any other descriptor,
I am trying to relate the reliability of ML models trained using binary
fingerprint to the presence of on-bits, i.e. comparing the on-bits in a
molecule in the test set to the on-bits in the training set. But I am getting
some strange results
The code is here so I will just summarise.
3 matches
Mail list logo