that's the difference between CS researchers and sysadmins.
sysadmins realize that there are an infinante number of files that map to the same hash value and plan accordingly (becouse we KNOW we will run across them eventually), and don't see it as a big deal when we finally do.
CS researches quote statistics that show how hard it is to intentiallly create two files with the same hash and insist it just doesn't happen until presented by the proof, at which point it is a big deal.
a difference in viewpoints.
On Sat, 16 Apr 2005, C. Scott Ananian wrote:
Date: Sat, 16 Apr 2005 10:58:15 -0400 (EDT) From: C. Scott Ananian <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: David Lang <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL PROTECTED]>, email@example.com Subject: Re: SHA1 hash safety
On Sat, 16 Apr 2005, Brian O'Mahoney wrote:
(1) I _have_ seen real-life collisions with MD5, in the context of Document management systems containing ~10^6 ms-WORD documents.
Dude! You could have been *famous*! Why the aitch-ee-double-hockey-sticks didn't you publish this when you found it?
Even given the known weaknesses in MD5, it would take much more than a million documents to find MD5 collisions. I can only conclude that the hash was being used incorrectly; most likely truncated (my wild-ass guess would be to 32 bits; a collision is likely with > 50% probability in a million document store for a hash of less than 40 bits).
I know the current state of the art here. It's going to take more than just hearsay to convince me that full 128-bit MD5 collisions are likely. I believe there are only two or so known to exist so far, and those were found by a research team in China (which, yes, is fairly famous among the cryptographic community now after publishing a paper consisting of little apart from the two collisions themselves).
Please, let's talk about hash collisions responsibly. I posted earlier about the *actual computed probability* of finding two files with an SHA-1 collision before the sun goes supernova. It's 10^28 to 1 against.
The recent cryptographic works has shown that there are certain situations where a decent amount of computer work (2^69 operations) can produce two sequences with the same hash, but these sequences are not freely chosen; they've got very specific structure. This attack does not apply to (effectively) random files sitting in a SCM.
That said, Linux's widespread use means that it may not be unimaginable for an attacker to devote this amount of resources to an attack, which would probably involve first committing some specially structured file to the SCM (but would Linus accept it?) and then silently corrupting said file via a SHA1 collision to toggle some bits (which would presumably Do Evil). Thus hashes other than SHA1 really ought to be considered...
...but the cryptographic community has not yet come to a conclusion on what the replacement ought to be. These attacks are so new that we don't really understand what it is about the structure of SHA1 which makes them possible, which makes it hard to determine which other hashes are similarly vulnerable. It will take time.
I believe Linus has already stated on this list that his plan is to eventually provide a tool for bulk migration of an existing SHA1 git repository to a new hash type. Basically munging through the repository in bulk, replacing all the hashes. This seems a perfectly adequate strategy at the moment.
WASHTUB Panama Minister Moscow explosives KUGOWN hack Marxist LPMEDLEY genetic immediate radar SCRANTON COBRA JANE KGB Shoal Bay atomic Bejing
( http://cscott.net/ )
-- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html