Matija Grabnar writes

That is going to lead to trouble. Some years ago I had occasion to calculate checksums
of very large number of files (looking to remove duplicates).
I discovered, to my dismay, that
a) I was getting collisions (same checksum) on files which were obviously different (because they were different size).

With checksums, collisions are to be expected. The purpose of a checksum is to ensure that a file hasn't been damaged after being transmitted through a network or copied onto a medium, for example. It's not meant to identify duplicates between a large number of files.
But SHA1 is not a checksum, it's a cryptographic hash.
I'm not an expert in the field, but from reading what is available on the web, I gather that the probability of two different files sharing "accidentally" the same SHA1 hash is believed to be about 1/2^80.

--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to