Re: [fonc] Error trying to compile COLA

BGB Sun, 11 Mar 2012 18:37:51 -0700

On 3/11/2012 4:51 PM, Martin Baldan wrote:

I won't pretend I really know what I'm talking about, I'm just
guessing here, but don't you think the requirement for "independent
and identically-distributed random variable data" in Shannon's source
coding theorem may not be applicable to pictures, sounds or frame
sequences normally handled by compression algorithms?

that is a description of random data, which granted, doesn't apply tomost (compressible) data.

that wasn't really the point though.

once one gets to a point where ones' data looks like this, then furthercompression is no longer possible (hence why there is a limit).

typically, compression will transform low-entropy data (with manyrepeating patterns and redundancies) into a smaller amount ofhigh-entropy compressed data (with almost no repeating patterns orredundancy).

I mean, many
compression techniques rely on domain knowledge about the things to be
compressed. For instance, a complex picture or video sequence may
consist of a well-known background with a few characters from a
well-known inventory in well-known positions. If you know those facts,
you can increase the compression dramatically. A practical example may
be Xtranormal stories, where you get a cute 3-D animated dialogue from
a small script.


yes, but this can only compress what redundancies exist.
once the redundancies are gone, one is at a limit.

specialized knowledge allows one to do a little better, but does notchange the basic nature of the limit.

for example, I was able to devise a compression scheme which reducedS-Expressions to only 5% their original size. now what if I want 3%, or1%? this is not an easy problem. it is much easier to get from 10% to 5%than to get from 5% to 3%.

the big question then is how much redundancy exists within a typical OS,or other large piece of software?

I expect one can likely reduce it by a fair amount (such as byaggressive refactoring and DSLs), but there will likely be a bit of alimit, and once one approaches this limit, there is little more that canbe done (as it quickly becomes a fight against diminishing returns).

otherwise, one can start throwing away features, but then there is stilla limit, namely how much can one discard and still keep the "essence" ofthe software intact.

although many current programs are, arguably, huge, the vast majority ofthe code is likely still there for a reason, and is unlikely the resultof programmers just endlessly writing the same stuff over and overagain, or resulting from other simple patterns. rather, it is morelikely piles of special case logic and optimizations and similar.

(BTW: now have in-console text editor, but ended up using full words formost command names, seems basically workable...).

Best,

-Martin

On Sun, Mar 11, 2012 at 7:53 PM, BGB<[email protected]>  wrote:

On 3/11/2012 5:28 AM, Jakub Piotr Cłapa wrote:

On 28.02.12 06:42, BGB wrote:

but, anyways, here is a link to another article:
http://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem


Shannon's theory applies to lossless transmission. I doubt anybody here
wants to reproduce everything down to the timings and bugs of the original
software. Information theory is not thermodynamics.

Shannon's theory also applies some to lossy transmission, as it also sets a
lower bound on the size of the data as expressed with a certain degree of
loss.

this is why, for example, with JPEGs or MP3s, getting a smaller size tends
to result in reduced quality. the higher quality can't be expressed in a
smaller size.

_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc


_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Error trying to compile COLA

Reply via email to