On Mon, Nov 15, 2021, 7:54 AM <[email protected]> wrote:
> How about this compression algorithm: we have a source code as a string.
> We also have a grammar that the source code conforms. If we calculate a
> hash of the source code, we get a very short string. Reverse function of
> hashing gives us hundreds of combinations representing potential original
> strings. Each of those strings may be parsed against the source code
> grammar. The first combination (and probably the only one) which parses
> against the source code grammar is lossless extracting from the hash string
> representing our compressed source code.
>
Besides taking a long time to decompress, the set of legal programs that
differ in only a single line like:
printf("..."); // At least 43 chars
is guaranteed to have SHA-256 collisions because there are over 2^256 legal
statements of this form.
In my tests, the PAQ series of compressors compress large C/C++ projects to
about 16 bits per line of code.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink:
https://agi.topicbox.com/groups/agi/T5ff6237e11d945fb-M5205a13adbf339c656921bcf
Delivery options: https://agi.topicbox.com/groups/agi/subscription