John Denker
Tue, 31 Aug 2004 14:55:48 -0700
Hadmut Danisch wrote:
It can be easily shown that there is no lossless compression method which can effectively compress every possible
input.
OK.
... I need a book about computer science or encoding theory,
which explicitely says that this is impossible, in a way that a person
unexperienced in computer science (judges at a court) can read and at least see that this is not just my personal phantasy and at least
somehow reasonable.
There are two conficting requirements there. -- authoritative and correct, and -- understandable to a judge with no technical expertise.
As you know, it is easy to satisfy either requirement separately.
My suggestions:
1) Get a few "expert witnesses" to certify that your position is certainly not a personal fantasy. (I assume German jurisprudence has something akin to the US notion of expert witnesses, right?)
2) Try to get the burden-of-proof reversed. The opposition are claiming that known algorithms do the job. Get them to be specific about which algorithms they are referring to. Then file with the court some disks, each containing 1.44 megabytes of data. If the opposition are telling the truth, they should be able to compress these using one of their chosen algorithms. Offer to concede the entire case if they can shorten the text by more than, say, 0.1 percent (i.e. 1.44 kilobytes shorter). Of course you fill your disks using a good hardware random number generator, e.g. http://www.av8n.com/turbid/ Be sure to arrange suitable precautions so that the opposition doesn't get a chance to peek at your disks and build a special-purpose "rumplestiltskin" encoder/decoder, i.e. one that contains quotations of your data. One precaution would be to require that they use an open-source implementation, or at least an impartial implementation, of a published algorithm.
3) Diagram the pigeon-hole argument for the judge. See diagrams below.
4) Don't forget the _recursion_ argument. Take their favorite algorithm (call it XX). If their claims are correct, XX should be able to compress _anything_. That is, the output of XX should _always_ be at least one bit shorter than the input. Then the compound operation XX(XX(...)) should produce something two bits shorter than the original input. If you start with a N-bit message and apply the XX function N-1 times, you should be able to compress each and every message down to a single bit.
============================================================ Here's how to diagram the pigeon-hole argument:
Write down all the three-bit numbers, and demonstrate a lossless non-compressive encoding:
plaintext codetext
000 ---------- 000
001 ---------- 001
010 ---------- 010
011 ---\ /=== 011 \/ /\ 100 ===/ \--- 100
101 ---------- 101
110 ---------- 110
111 ---------- 111
===========================================
Then give an example of a lossy compressive code. Make the point that the codetext words are shorter than the plaintext words. However, there are necessarily fewer of them, so the coding scheme is necessarily lossy and non-invertible.
plaintext codetext 000 ---------- 00 "zero"
001 ---------- 01 "one"
010 ---------- 10 "two"
011 -------=== 11 "many" / | 100 ----/| | | 101 ----/| | | 110 ----/| | | 111 ----/
=========================================
Then give an example of a code that might or might not be compressive, depending on statistical assumptions. The following code is compressive if zero, one, and two are considerably more probable than the other three-bit numbers, but does is anti-compressive if all eight three-bit numbers are equally likely, or nearly equally likely.
000 ---------- 00 "zero"
001 ---------- 01 "one"
010 ---------- 10 "two"
011 ---------- 11000
100 ---------- 11001
101 ---------- 11010
110 ---------- 11011
111 ---------- 11100
Average length, for equiprobable plaintext words: (2+2+2+5+5+5+5+5) / 8 = 3.875 which is an expansion, not a compression.
Judges like fairness. Certainly an equiprobable distribution of input words is fair. It reflects the bit-strings that would be generated by tossing fair coins (three at a time), or tossing a fair eight-sided die. This fairest of distributions is incompressible in the usual sense.
================================================
Finally, offer a challenge. Write down all eight three-bit plaintext words, and all four two-bit codetext words. Ask the judge to conclude for himself that it is obviously impossible to draw lines connecting corresponding words in a one-to-one fashion.
000 00
001 01
010 10
011 11
100
101
110
111
=================================
Note that in the last example, the codetext words were only one bit shorter than the plaintext words. If you want to pound on the point, present the samething with four-bit plaintext and two-bit codetext, i.e. where the codetext is _two_ bits shorter than the plaintext, and demonstrate that the problem is even more extreme.
It may be better to leave off the diagram for this one, so as to not burden the judge ... i.e. just make the assertion and dare the opposition to refute it. Remember that the typical judge chose to study law because he didn't *want* to learn about logarithms and entropy and all that.
=======================================================
Important note:
The following may sound like something from the Scholasticism of the dark ages (angels dancing on pins and all that) but it's true.
I personally have been shouted down at conferences for making the points discussed above. The shouters say that Lempel-Ziv is "universal". Indeed they point out the the title of the seminal paper by Lempel & Ziv is "A Universal Algorithm for Sequential Data Compression"
And the funny thing is, in some narrow Scholastic sense they are right ... LZ is "universal". The fun comes from the fact that they have _defined_ the word "universal" in a funny way. According to their definitions, a "universal" compressor is one that is no worse than any other by more than some constant. (The constant depends on which compressors are being compared, but is independent of the choice of inputs.)
Notice that I am careful to write the term "universal" in scare-quotes whenever this twisted technical meaning is intended.
A "universal" compressor is not always compressive.
To summarize: An algorithm that is "universal" in the narrow technical Scholastic sense is not universal, or all-purpose, or general-purpose in the ordinary vernacular sense.
Therefore I recommend you choose your words very carefully, so the bad guys don't get a chance to trip you up on a narrow technicality.
--------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]