| Bear writes: | > One interesting idea which I came up with and haven't seen a way | > past yet is to XOR each block with a value computed from its | > sequence number, then compute the hash function on the blocks in | > a nonsequential order based on the plaintext of the blocks.... | > in their new order. | | This is an interesting idea, but it does not fully prevent the Joux | attack. This is seen most obviously in the two block case, where | your method can at most increase the attacker's work by a factor of 2. | Joux shows how to find a 4-collision using 2*2^(n/2) work, while it should | take 2^(3n/4) work. In the case of a 160 bit hash this is the difference | between 2^81 and 2^120. Doubling the work to find the 4-collision will | not do much to address this discrepency. | | Even in the more general case, where Joux puts k blocks together to | generate 2^k multicollisions, I can see a way to attack your method, | with an increase in the work by only a factor of k^2.... There's a more general problem with the algorithm: It isn't on-line. An implementation requires either an unbounded internal state, or the ability to re-read the input stream. The first is obviously impossible, the second a problem for many common uses of hash functions (in communications, as opposed to storage).

However ... *any* on-line algorithm falls to a Joux-style attack. An algorithm with fixed internal memory that can only read its input linearly, exactly once, can be modeled as an FSM. A Joux-style attack then is: Find a pair of inputs M1 and M1' that, starting from the fixed initial state, both leave the FSM in state S. Now find a pair of inputs M2 and M2' that, starting from S, leave the FSM in state S'. Then for the cost of finding these two collisions, we have *four* distinct collisions as before. More generally, by just continuing from state S' to S'' and so on, for the cost of k single collision searches, we get 2^k collisions. The nominal security of the FSM is its number of states; for k large enough, we certainly get "too many" collisions compared to a random function. What this shows is that our vague notion that a hash function should behave like a random function can't work. On-line-ness always kills that. (In fact, even given the function random access to its input doesn't help: An FSM can only specify a finite number of random "places" to look in the input. If the FSM can only specify an absolute location, it can't work on arbitrarily long inputs. If it can specify a location relative to the current position, you can re-write the machine as a linear one that gets repeated copies of blocks of a fixed size.) This is really just the prefix property writ large. Define an on-line function f:{0,1}*->{0,1}^k as one with the property that there exist infinitely many pairs of strings Si, Si' with the property that, for any S in {0,1}*, f(Si||S) = f(Si'||S). (In particular, this is true for S the empty string, so f(Si) = f(Si').) Then the most we can expect of an (on-line) hash function is that it act like a function picked at random from the set of on-line functions. (This, of course, ignores all the other desireable properties - we want to choose from a subset of the on-line functions.) Getting a security parameter in there is a bit tricky. An obvious first cut is to say an on-line function has minimum length n if the shortest Si has length n. Actual hash functions don't follow this model exactly. For example, MD5 needs 512+128+64 bits of internal storage* - the first for the input buffer, all of whose bits are needed together, the second for the running hash state, the third for the length. So there are 2^704 states in the MD5 FSM, but the output only has 2^128 values - only 128 bits of the "state number" are used. That in and of itself isn't an issue - it doesn't hurt, in this particular analysis, to throw bits away in the *final* result. What does limit it is that every 512 input bits, the FSM itself logically "mods out" 512 bits of the state (the input buffer), and ends up in one of "only" 2^192 possible states (and if we work with "short" strings, then only a small fraction of the 2^64 states of the length field are actually accessible). Because of this, MD5 can at best have been chosen from on-line functions of minimum length 128, rather than those of a minimal length up to 704. That's why keeping more internal state *might* help - though you have to be careful, as we've seen, about how you do it. -- Jerry * Actually, you need 3 more bits because the MD5 length is in octets, not bits. This memory is hidden in any real implementation on byte-oriented hardware. --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]