Matija Grabnar writes
That is going to lead to trouble. Some years ago I had occasion to
calculate checksums
of very large number of files (looking to remove duplicates).
I discovered, to my dismay, that
a) I was getting collisions (same checksum) on files which were
obviously different (because they were different size).
With checksums, collisions are to be expected. The purpose of a
checksum is to ensure that a file hasn't been damaged after being
transmitted through a network or copied onto a medium, for example.
It's not meant to identify duplicates between a large number of files.
But SHA1 is not a checksum, it's a cryptographic hash.
I'm not an expert in the field, but from reading what is available on
the web, I gather that the probability of two different files sharing
"accidentally" the same SHA1 hash is believed to be about 1/2^80.
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail