On 10/14/2010 10:09 PM, Zooko O'Whielacronx wrote:

This part is a technical mistake, see
http://cr.yp.to/hash/collisioncost-20090517.pdf .

I wouldn't be
surprised if you are right that this is how NIST came up with the
requirement for a 512-bit hash function though.

NIST may not really need a reason to standardize a 512 bit function. If somebody somewhere is going to use 512 bit hashes, regardless of whether or not this size is justified, it's probably better for everyone's sake that they're at least standardized.

Apparently SHA-2-384 is what they use for TOP SECRET material:
http://www.nsa.gov/ia/programs/suiteb_cryptography/index.shtml
Of course, they don't say what they use for the even more secret stuff.

SHA-2-512 is derived from SHA-2-256 mostly by doubling the internal values from 32 to 64 bits, and SHA-2-384 is a truncation of that. So even if the actual need were only for a 384-bit hash, it would have been harder _not_ to end up with a SHA-2-512 as a side effect.

Obviously, SHA-3 can't go without a 512 bit version if the older SHA-2 has one.

This is maybe a simpler explanation for why we have 512 bit hash functions. But admittedly, it's not as interesting as the idea of a captured space alien machine out at Area 51 generating exhaustive rainbow tables for SHA-2-256.

I think if you want a hash function to be much less likely to fail at
pre-image resistance (without being much more likely to fail at
collision-resistance)

Do you know a way to improve one without improving the other?

then you need to increase the amount of
computation performed (while keeping the state size the same or
larger).

There's an old saying among engineers: "Any engineer can make a bridge that stands. It takes a great engineer to make a bridge that stands just barely." I.e., using a minimum cost of materials.

So let's turn the question around: say you're an engineer designing a hash function suitable for use in applications such as RFID tags. Every picowatt-microsecond of power you consume is going to take away from the RF transponder power and limit the data rate. Needless to say, you have a limited budget for your state size and the number of computations you can perform.

So how do you design a hash function that provides the best security given limited resources? What's the most efficient conversion from N bits of state and M CPU cycles to security? Shouldn't there be something in the SHA-3 family for this engineer?

I don't think SHA-3 is being developed because of a real weakness in SHA-2. It's probably because we can do so much better in terms of overall efficiency that environments for which SHA-2 is way too expensive will be able to move off of MD5, MD4, or whatever hacked-up LFSR they're using now.

For example, you could take SHA1 and double the number of
rounds. This would take twice as much CPU and it would be *much* less
likely to fail at pre-image resistance than SHA1.

It really depends.

The rounds of SHA-1 are not all the same like they are in SHA-2, so there are various ways you could go about doubling them. Given that there are significant attacks against them already, are you sure you'd want to spend your resources on just more of the same?

What if an attack focused only the first or the last rounds, or both of them, but independently? You'd have the same calculations in the same places.

So in any case, instead of the current SHA-1, a 160-bit output length hash function that has attacks against it that are worrying but not quite practical, you would end up with a 160-bit output length hash function that has half the throughput, cannot take advantage of any off-the-shelf hardware acceleration, is not approved for use in any setting which approves such things, is incompatible with every existing standard, and still has attacks against it but they are probably more impractical.

(Whether it would be
*more* likely to fail at collision resistance I'm not sure.)

The rounds form a basic block cipher, so there's no information loss in that part so doubling the rounds shouldn't increase collisions. But there may be other attacks that become available.

But if you decided to go about it by simply repeating every 64 bytes of message data as it was fed into the input side (perhaps you had to use a library function and couldn't modify it), then that would actually degrade the entropy present in the output a little.

Likewise I think if you are willing to throw more state at it
(double-width pipe, big fat sponge) and throw a proportionally greater
amount of CPU at it, you are can make any reasonable hash function
much less likely to fail at collision-resistance.

Not long ago I would have unequivocally agreed with you. But I got interested, and started reading about it and now believe the reality is more complicated.

For example, the paper describing the Skein function describes the process used for selecting an optimal sequence of rotation constants for each round. It was sort of a heuristic algorithm run separately for the different internal state sizes.

If you tried to widen a function without a detailed knowledge of the decisions that went into its design, you'd likely make some suboptimal choices. Look at the SHA-2-256 vs SHA-2-512 rotation constants, how did they decide those changes? Why did they increase the rounds to 80 and not 96? There's also the possibility that the original designer just got a bit lucky, so even if you followed the same principles when extending it, you'd be rolling the dice again.

SHA-3 had 51 entrants, when a winner is selected, if you widened it, how would your modified version fare in a new SHA-4 competition specifically for costlier designs?

So I do think there
is a (fuzzy) trade-off between computational efficiency and safety
here.

Several have tunable parameters where the safety-cost trade-off can be decided late in the process.

I think the most relevant definition of efficiency is something like:

        safe, effective security
                -per-
        computational cost (cycles, die area, power, etc.).

If we get a function that optimized according to that definition then we can use it as a building block to satisfy any other need.

What I would like to see next is approved ways of combining hash functions such that they provide redundant safety and allow us to apply more CPU power like you were saying. A lot like the standardized HMAC and block cipher modes.

A standardized algebraic notation could allow us to interchange data.
For example:
    MD5 + SHA-1                  = the combination of MD5 and SHA-1
    (SHA-2-512 + SHA-3-512)/256  = combo with reduced 256 bit output
    SHA-1*2                      = SHA-1 applied twice

The scheme might define a partial or total ordering based on security.

Perhaps the values assigned to hash primitives could be reconfigured over time to reflect the discovery of new attacks. A protocol such as TLS could negotiate the best combination of security and efficiency by taking into consideration the greatly varying capabilities of each endpoint (e.g., some endpoints have hardware acceleration for some algorithms).

This implies that the fastest SHA-3 candidates are the least
safe. ;-)

I'm sure we'll get one that's very fast and unquestionably safe.

- Marsh
_______________________________________________
cryptography mailing list
[email protected]
http://lists.randombit.net/mailman/listinfo/cryptography

Reply via email to