Hi, I need a literature reference for a simple problem of encoding/compression theory:
It can be easily shown that there is no lossless compression method which can effectively compress every possible input. Proof is easy: In a first step, consider all possible messages of length n bit, n>0. There are 2^n different ones. But there are only (2^n)-1 shorter messages, so there is no injektive encoding to encode all messages into a shorter one. And then all codewords of length <n are occupied, so it is impossible to compress messages shorter than n bit. So when trying to compress a message of length m, m<n, it must be encoded in to a codeword of at least n bit, thus longer than m and not effectively compressed. (And you'd even have to consider the entropy of the eom sign or the bit counter) Or in other words: For every input word which is compressed into a shorter codeword, there must be another, shorter input word, which cannot be effectively compressed, but gets longer - if the algorithm/function should be able to accept any input and should be lossless, i.e. for any input a decompress(compress(a))=a. Thus, for every lossless compression method, which can accept any input message and is not completely useless (i.e. there is at least one message which's codeword is shorter than the message), there is at least one input which's codeword is longer than the message. As far as I know that's stuff of the early semesters of computer science to learn, that in theory there is no universal lossless method compressing everything. Lossless compression is the idea to encode those messages with higher probabilities into shorter codewords, and those with lesser probability into longer codewords, thus reducing the average length of the messages, not the length of every single message. As I mentioned earlier, I have some trouble with a computer science department. They do not want to believe that there is no such algorithm, and they claim that there are such algorithms which can compress everything without loss and without any input resulting into a longer codeword. They claimed that Lempel-Ziv and MTF (Move to Front) would do the job. I've given counterexamples in LZ, showed that gzip on a file filled with random numbers results in a bigger file, and showed that MTF is not a compression at all, since it does not change the length of a message. They don't understand. Therefore, I need a book about computer science or encoding theory, which explicitely says that this is impossible, in a way that a person unexperienced in computer science (judges at a court) can read and at least see that this is not just my personal phantasy and at least somehow reasonable. I have checked several books about information and coding theory, but didn't find any suitable for readers without computer science background. The statement is always hidden in formulas about entropy, average code lengths, code efficiency inequations etc. If you had such an encoding scheme, you could easily show that the average length is below the entropy of the efficiency is >100%. But a non-computer science person does not understand that. Does anybody know a book about coding theory which explicitely states the impossibility of such a compression method in plain language? regards Hadmut --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
