On Monday, August 18, 2003, at 09:01 AM, Dave Howe wrote:
randomness is a funny thing. a truely random file can be *anything* that
is the right length - including a excerpt from william shakespere
(provided you have enough monkeys)
it is most likely to be random garbage, for a large sample - but for a
very small sample the odds of it being an ordered sample are surprisingly
good.
Quibble: only surprising if one misunderstands probability. (Not saying you do, just quibbling with any claim that readily calculated probabilities can be "surprising.")
the obvious example here is a double coin test - two bits. an orderedinstantly shatter our coalition, turning the whole Arab
sample would be both heads or both tails. a disordered sample would be one
head and one tail. in practice, you would expect to see half the results
from a larger trial (say 32 double-throws) with a ordered sample, and half
disordered.
as you reach three coins, the odds of a completely ordered result decrease
(from 2 in 2^2, to 2 in 2^3). for four coins, you still have the same two
compressable results. consider:
HHHH HHHT HHTH HHTT
HTHH HTHT HTTH HTTT
THHH THHT THTH THTT
TTHH TTHT TTTH TTTT
in a large trial, you would expect to see each of these once every 2^4
tests, on the average. obviously HHHH and TTTT are very compressable.
assuming a runlength encoding, I don't think any of the others are
compressable at all...."We should not march into Baghdad. To occupy Iraq would
world against us and make a broken tyrant into a latter-
day Arab hero. Assigning young soldiers to a fruitless
hunt for a securely entrenched dictator and condemning
them to fight in what would be an unwinable urban guerilla
war, it could only plunge that part of the world into ever
greater instability."
--George H. W. Bush, "A World Transformed", 1998
For any sequence of n fair tosses, the "length after compression" of the outcome is roughly, modulo some constants and factor, (1 - 2^(-n)), where "1" is the uncompressed length. In other words, as n gets large nearly all strings have a "length after compression" that is close to the original length, i.e., little compression. As n gets arbitrarily large, an arbitrarily small fraction of strings have a short, concise description (are "compressed").
--Tim May
