On Thu, 23 Sep 2010, Arms, Mike wrote: > > I want to second this recommendation. I wrote a script that > recursively descends and writes out the MD5, SHA1, file length, and > file path. Using those first three parameters *in combination* is darn > close to 100% for determining file uniqueness. I have never come > across two files that differ but still have the same > > $MD5 . $SHA1 . $LENGTH > > (had to throw in some Perl :-)
I do wonder why you needed to combine all three. Having a collision of the MD5 by itself is extremely unlikely unless someone intentionally tried to construct a file that has the same MD5 as another one (this is an MD5 vulnerability, and you should switch to one of the SHA algorithms if you have to worry about it). But for random files it would be highly unlikely; statistically it would take you on average 100 years to find a collision if you checked several billion files per second continuously. Concatenating multiple digests will just make your database searches slower because the index fields are longer, without providing you much actual benefit. So I would be really surprised if you had two different files with the same MD5 on your disk. If you did, how many files did you have in total? Cheers, -Jan _______________________________________________ ActivePerl mailing list ActivePerl@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs