Date: Tue, 18 Apr 2000 09:23:10 -0400
    From: "Noel L Yap" <[EMAIL PROTECTED]>

    [EMAIL PROTECTED] on 04/17/2000 09:02:16 PM
    >Compression methods are reversible:  You have to be able
    >to get out what you put in.  Digests such as MD5 have no such
    >requirement, so please don't compare the two.

    I think Larry's point is that, since there is no one-to-one mapping between
    information and digests, it is (theoretically) possible to get two different
    files to spit out the same MD5.

But the fundamental -practical- point here is that 2^128 is a -very-
large number, and the chances that you'd have an -accidental-
collision (e.g., birthday-paradox events, which will be more frequent
than 2^128) is still probably lower than the chances that your
hardware makes a bit-error all by itself.  (After all, the SECDED and/or
parity used in RAM, and even the Reed-Solomon codes used to validate
data being read from a disk, have -many- fewer bits and are -much- more
likely to let an error creep by undetected; just check the "undetected
read errors" figure of merit for your disk drives, and multiply by any
other error-sources in the interface, the CPU, the memory, ...)

Remember also that you could use SHA-1 and get 160 bits of hash.

Of course, it's entirely likely that there are wider-than-32-bit CRC's
which are much faster to compute than either MD5 or SHA-1.  Such CRC's
presumably only require -reading- the entire file, but not storing
-all- of it in memory at any given time.  For an excellent tutorial on
CRC's, see http://www.repairfaq.org/filipg/LINK/F_crc_v31.html or do a
Google search for "crc_v3" and grab a cached copy (since both
ftp://ftp.rocksoft.com/clients/rocksoft/papers/crc_v3.txt and the URL
listed in the paper, ftp.adelaide.edu.au/pub/rocksoft/crc_v3.txt, seem
to have vanished---I'd recommend one of the flat-text cached copies,
actually, and not the repairfaq HTML, which splits up the document
into several pieces).  The downside of such wide CRC's is that they
may not have already been efficiently coded for you by someone else,
at which point you're probably better off going with something that's
been extensively vetted and tuned, of which both MD5 and SHA-1 are
excellent candidates.

Reply via email to