At 01:55 PM 1/18/2007, John Denker wrote:
We would be better off maintaining just the one technical definition of entropy, namely S = sum_i P_i log(1/P_i). If you want to talk about something else, call it something else ... or at least make it clear that you are using the term in a nontechnical or metaphorical sense. ... If you want to talk about work factor, call it work factor. If you want to talk about entropy, call it entropy.
One of the roots of the problem is that for many applications, i is a well-defined event and P(i) is a fixed value (for i) , but for many other applications, i might not be a well-defined event, and/or P(i) is really a conditional probability, P(i|other-stuff-you-know), and it's hard to tell whether that's usefully different from the non-conditional P(i). One case where this comes up is key generation with entropy pools - you take a bunch of hopefully-kinda-independent, hopefully-identically-distributed variables generated by processes that are complicated enough to look random, make some estimates of their probability distributions and hence of their entropy, and add the estimates together, after prewhitening the samples to make any correlation harder to exploit. This gives you N bits of entropy, and you take M bits of it to use as a key, again possibly with some hash functions to make calculations more difficult. So how many bits of entropy are left? You can be conservative and call it N-M, assuming that if somebody were clever enough to take the M bits key sample they could validate the rest with only N-M calls to an oracle, or you could be wildly optimistic and call it N, assuming that the conditional probabilities of the remaining pool are still the same given that you know the M bits, because there's no way to use them to validate anything, and if you've done a good enough job of designing your pool management functions the reality is probably closer to the wildly optimistic case, unless somebody finds an efficient way to invert your hash functions, at which point the conditional probabilities are actually different if you know the M key bits than if you don't. Another entropy example was the Venona decryptions - people banging "randomly" on typewriters didn't actually produce independent or identically distributed letters, so the conditional probabilities didn't actually match the assumed ones, so the entropy estimates were wrong, and human language plaintext being what it is, they really needed the 1-bit-per-bit of key entropy. It's much tougher to exploit than OTPs used more than once, but it's a start. But hey, nobody's going to guess that your password is your dog's name spelled backwards, or think of using a list of a few million obvious passwords as input to the key-generation function. --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]