Re: [agi] All Compression is Lossy, More or Less

Matt Mahoney Mon, 15 Nov 2021 08:12:55 -0800

On Mon, Nov 15, 2021, 7:54 AM <[email protected]> wrote:

> How about this compression algorithm: we have a source code as a string.
> We also have a grammar that the source code conforms. If we calculate a
> hash of the source code, we get a very short string. Reverse function of
> hashing gives us hundreds of combinations representing potential original
> strings. Each of those strings may be parsed against the source code
> grammar. The first combination (and probably the only one) which parses
> against the source code grammar is lossless extracting from the hash string
> representing our compressed source code.
>


Besides taking a long time to decompress, the set of legal programs that
differ in only a single line like:

printf("..."); // At least 43 chars

is guaranteed to have SHA-256 collisions because there are over 2^256 legal
statements of this form.

In my tests, the PAQ series of compressors compress large C/C++ projects to
about 16 bits per line of code.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5ff6237e11d945fb-M5205a13adbf339c656921bcf
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] All Compression is Lossy, More or Less

Reply via email to