Someone mentioned Physics in this discussion and this was for me a motivation to point out something that has been forgotten by Shannon, Kolmogorov, Chaitin and in this thread.
Even though Shannon's data entropy formula looks like an absolute measure (there is no reference included), the often confusing fact is that it does depend on a reference. The reference is the probability model that you assume to fit the data ensemble. You can have the same data ensemble and many different (infinite) probability models that fit that data ensemble, each one giving you a valid but different entropy value. For example, if a source sends the number "1" 1,000 times in a row, what would be the source's entropy? Aram's assertion that the "sequence of bytes from 1-256" has maximum entropy would be right if that sequence came as one of the possible outcomes of a neutron counter with a 256-byte register. Someone's assertion that any data has entropy X can be countered by finding a different probability model that also fits the data, even if the entropy is higher (!). In short, a data entropy value involves an arbitrary constant. The situation, which seems confusing, improves when we realize that only differences in data entropy can be actually measured, when the arbitrary constant can be canceled -- if we are careful. In practice, because data security studies usually (and often wrongly!) suppose a closed system, then, so to say automatically, only difference states of a single system are ever considered. Under such circumstances, the probability model is well-defined and the arbitrary constant *always* cancel. However, data systems are not really closed, probability models are not always ergodic or even accurate. Therefore, due care must be exercised when using data entropy. I don't want to go into too much detail here, which results will be available elsewhere, but it is useful to take a brief look into Physics. In Physics, Thermodynamics, entropy is a potential [1]. As is usual for a potential, only *differences* in entropy between different states can be measured. Since the entropy is a potential, it is associated with a *state*, not with a process. That is, it is possible to determine the entropy difference regardless of the actual process which the system may have performed, even whether the process was reversible or not. These are quite general properties. What I'm suggesting is that the idea that entropy depends on a reference also applies to data entropy, not just the entropy of a fluid, and it solves the apparent contradictions (often somewhat acid) found in data entropy discussions. It also explains why data entropy seems confusing and contradictory to use. It may actually be a much more powerful tool for data security than currently used. Cheers, Ed Gerck [1] For example, J. Kestin, A Course in Thermodynamics, Blaisdell, 1966. --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]