On Tue, Dec 2, 2025 at 12:25 AM <[email protected]> wrote:
>
> https://encode.su/threads/3595-Star-Engine-AI-data-compressor?p=86553#post86553

I'm glad somebody here is still doing concrete work toward AGI.

The encode forum gives good advice. Learn C/C++. Almost all data
compression code is written in it, and it is 100 times faster than
Python. Document your code. What does your program do? How do you run
it? Describe the compressed format and the algorithm for decoding it.
Then describe the encoding algorithm.

It looks like your program uses some kind of PPM or context model with
arithmetic coding. I'm not sure. It produced a 56,487 byte file filled
with random digits, which I assume is the arithmetic coder output in
base 10 instead of base 256. I tested it in Ubuntu and it looks like
it compressed pre-processed-enwik5 in 65 seconds. It reports a
compressed size of 23,465 (about the actual output size divided by
log(256)) on my Lenovo Core i7-1165G7, 2.80 GHz, 16 GB. Here are some
results I got with zip -9, 7zip, zpaq -m5, and paq8px_v67 -8.

54,781 pre-processed enwik5.txt
29,895 x.zip
26,631 x.7z
23,479 x-m5.zpaq
20,582 x.paq8px

paq8px_v67 compressed in about 5 seconds. The others were less than 1 second.

I didn't compare with enwik5 because preprocessing by cmix -s hides
information in the external dictionary, which has to be present to
decompress. For the Hutter prize, cmix appends a compressed copy of
the dictionary to the compressed file. When I compress 100,000 byte
enwik5 directly with paq8px_v67 I get 24,838 bytes.

-- 
-- Matt Mahoney, [email protected]

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tf0bedfcd44454678-M076e00c1d6b0835aec5fb0ab
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to