RE: Best way to compare to files in Perl

Jan Dubois Thu, 23 Sep 2010 13:16:51 -0700

On Thu, 23 Sep 2010, Arms, Mike wrote:
> 
> I want to second this recommendation. I wrote a script that
> recursively descends and writes out the MD5, SHA1, file length, and
> file path. Using those first three parameters *in combination* is darn
> close to 100% for determining file uniqueness. I have never come
> across two files that differ but still have the same
> 
>       $MD5 . $SHA1 . $LENGTH
> 
> (had to throw in some Perl :-)


I do wonder why you needed to combine all three.  Having a collision
of the MD5 by itself is extremely unlikely unless someone intentionally
tried to construct a file that has the same MD5 as another one (this
is an MD5 vulnerability, and you should switch to one of the SHA
algorithms if you have to worry about it).

But for random files it would be highly unlikely; statistically it would
take you on average 100 years to find a collision if you checked several
billion files per second continuously.

Concatenating multiple digests will just make your database searches
slower because the index fields are longer, without providing you much
actual benefit.

So I would be really surprised if you had two different files with the
same MD5 on your disk.  If you did, how many files did you have in total?

Cheers,
-Jan

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Best way to compare to files in Perl

Reply via email to