(Will whomever prepending this "Re: *** GMX Spamverdacht ***" header please STOP.)


On Monday, August 18, 2003, at 09:01 AM, Dave Howe wrote:
randomness is a funny thing. a truely random file can be *anything* that
is the right length - including a excerpt from william shakespere
(provided you have enough monkeys)
it is most likely to be random garbage, for a large sample - but for a
very small sample the odds of it being an ordered sample are surprisingly
good.

Quibble: only surprising if one misunderstands probability. (Not saying you do, just quibbling with any claim that readily calculated probabilities can be "surprising.")




the obvious example here is a double coin test - two bits. an ordered
sample would be both heads or both tails. a disordered sample would be one
head and one tail. in practice, you would expect to see half the results
from a larger trial (say 32 double-throws) with a ordered sample, and half
disordered.
as you reach three coins, the odds of a completely ordered result decrease
(from 2 in 2^2, to 2 in 2^3). for four coins, you still have the same two
compressable results. consider:
HHHH HHHT HHTH HHTT
HTHH HTHT HTTH HTTT
THHH THHT THTH THTT
TTHH TTHT TTTH TTTT


in a large trial, you would expect to see each of these once every 2^4
tests, on the average. obviously HHHH and TTTT are very compressable.
assuming a runlength encoding, I don't think any of the others are
compressable at all...."We should not march into Baghdad. To occupy Iraq would
instantly shatter our coalition, turning the whole Arab
world against us and make a broken tyrant into a latter-
day Arab hero. Assigning young soldiers to a fruitless
hunt for a securely entrenched dictator and condemning
them to fight in what would be an unwinable urban guerilla
war, it could only plunge that part of the world into ever
greater instability."
--George H. W. Bush, "A World Transformed", 1998
For any sequence of n fair tosses, the "length after compression" of the outcome is roughly, modulo some constants and factor, (1 - 2^(-n)), where "1" is the uncompressed length. In other words, as n gets large nearly all strings have a "length after compression" that is close to the original length, i.e., little compression. As n gets arbitrarily large, an arbitrarily small fraction of strings have a short, concise description (are "compressed").


--Tim May

Reply via email to