Brian Huddleston wrote :
|| > MD5 *must* duplicate. It may never duplicate in practice; it may never
|| > duplicate over the life of a single project. But if you are designing
|| > aircraft software, you must be able to say 'we need to check every byte
|| > for changes'.
||
|| Well, its a little worse than that. From the PGP O'Reilley book:
|| "So why does MD5 seems so seecure? Because 128 bits allows you to have
|| 2^128=340,282,366,920,938,463,374,607,431,768,211,456 different possible
|| MD5 codes. That is a number that is billions of times larger than the total
|| number
|| of documents that will ever be created by the human race for the next
|| thousands
|| of year."
||
|| So while it is possible that MD5 could give you an erroneous result, it is
|| statistically
|| so close to zero as to be almost impossible.
||
|| (You might checkout http://www.rsasecurity.com/rsalabs/faq/3-6-6.html (which
|| is cool
|| in that it has links to the actual papers in addition to being a high level
|| overview)) The PGP O'Reiley
|| book also has a pretty good high-level overview, but I would recommend Bruce
|| Scheiner's Applied
|| Cryptography, if anyone in the audience is interested in how they work and
|| how you use them.)
Schneier (note the spelling) has reservations about RC5 - there is an
attack that makes it possible to create string pairs that have that
same hash code - but that is for cryptographic usage and it is by no
means clear that a equally hashing value can be found for an
arbirtrary text.
You're still talking universe lifetimes as the collision frequency when
there is no deliberate attempt to cause a collision.
|| Of course, if you're just ultra-paranoid you could use SHA-1 as your digest
|| algorythm. It uses
|| 160-bits and a better algorhythm.
That would be safer, but it is probably of the "more infinitessimal"
flavour rather than critical.
|| Compare the above to a timestamp which can fail if:
||
|| 1) The edits are within the granularity of the time stamp.
|| 2) The sys-admin (or any bozo with sudo shell access) diddles the system
|| clock.
|| 3) Daylight savings switchover (in most parts of the US).
Only on non-Unix systems. Unix time-stamps are second since 00:00:00
Jan 1 1970. It only when that value is converted to display format
that timezone and daylight have any effect, but that doesn't happen
for timestamp comparisons.
|| 4) Automatic NTP correction of the system time (pretty common in the Unix
|| server world).
|| (Under Windows 2000 it is possible for all the machines in a given
|| domain to periodically
|| sync their clocks with the PDC).
This is usually done by stretching/contracting the clock (updating
the seconds counter every 99 or 101 ticks of the hundreds of a second
interrupt for a while until the desired shift has occurred.
|| 5) touch -r (although that's a bit of a pathological case)
|| 6) Getting completely scrambled by a misbehaving samba servers. (Heh...no
|| flames
|| please. Just something I've seen happen.)
7) Getting the file content changed by a disk hardware error.
|| How often do these happen? I'd be willing to bet $50 that it is less often
|| than a 128bit or 160bit
|| digest routine duplicates. ;-)
Even without items 3 and 4, you're on very safe ground here. You
could even provide odds. Anyone worried about hash collisions had
better be desperately concerned about 7.
--
Anyone who can't laugh at himself is not | John Macdonald
taking life seriously enough -- Larry Wall | [EMAIL PROTECTED]