On 3/11/2012 4:51 PM, Martin Baldan wrote:
I won't pretend I really know what I'm talking about, I'm just
guessing here, but don't you think the requirement for "independent
and identically-distributed random variable data" in Shannon's source
coding theorem may not be applicable to pictures, sounds or frame
sequences normally handled by compression algorithms?

that is a description of random data, which granted, doesn't apply to most (compressible) data.
that wasn't really the point though.

once one gets to a point where ones' data looks like this, then further compression is no longer possible (hence why there is a limit).

typically, compression will transform low-entropy data (with many repeating patterns and redundancies) into a smaller amount of high-entropy compressed data (with almost no repeating patterns or redundancy).


I mean, many
compression techniques rely on domain knowledge about the things to be
compressed. For instance, a complex picture or video sequence may
consist of a well-known background with a few characters from a
well-known inventory in well-known positions. If you know those facts,
you can increase the compression dramatically. A practical example may
be Xtranormal stories, where you get a cute 3-D animated dialogue from
a small script.

yes, but this can only compress what redundancies exist.
once the redundancies are gone, one is at a limit.

specialized knowledge allows one to do a little better, but does not change the basic nature of the limit.

for example, I was able to devise a compression scheme which reduced S-Expressions to only 5% their original size. now what if I want 3%, or 1%? this is not an easy problem. it is much easier to get from 10% to 5% than to get from 5% to 3%.


the big question then is how much redundancy exists within a typical OS, or other large piece of software?

I expect one can likely reduce it by a fair amount (such as by aggressive refactoring and DSLs), but there will likely be a bit of a limit, and once one approaches this limit, there is little more that can be done (as it quickly becomes a fight against diminishing returns).

otherwise, one can start throwing away features, but then there is still a limit, namely how much can one discard and still keep the "essence" of the software intact.


although many current programs are, arguably, huge, the vast majority of the code is likely still there for a reason, and is unlikely the result of programmers just endlessly writing the same stuff over and over again, or resulting from other simple patterns. rather, it is more likely piles of special case logic and optimizations and similar.


(BTW: now have in-console text editor, but ended up using full words for most command names, seems basically workable...).

Best,

-Martin

On Sun, Mar 11, 2012 at 7:53 PM, BGB<cr88...@gmail.com>  wrote:
On 3/11/2012 5:28 AM, Jakub Piotr Cłapa wrote:
On 28.02.12 06:42, BGB wrote:
but, anyways, here is a link to another article:
http://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem

Shannon's theory applies to lossless transmission. I doubt anybody here
wants to reproduce everything down to the timings and bugs of the original
software. Information theory is not thermodynamics.

Shannon's theory also applies some to lossy transmission, as it also sets a
lower bound on the size of the data as expressed with a certain degree of
loss.

this is why, for example, with JPEGs or MP3s, getting a smaller size tends
to result in reduced quality. the higher quality can't be expressed in a
smaller size.
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to