Re: [agi] Re: My data compressor is rising from the deep

Matt Mahoney Sat, 11 Apr 2020 19:02:45 -0700

On Sat, Apr 11, 2020, 6:19 PM <[email protected]> wrote:


> pypy or Cython or CPython or Jython? Have you tried pypy? It says it's
> still in experimentation stage.
>

Really you should learn C and C++ because that's what most data compression
developers use. You need to be able to read their code.

There is a lot of optimization that you can't do even in compiled Python or
Java. For example, the indirect context models used extensively in PAQ and
ZPAQ are designed to minimize cache misses because random memory access
takes hundreds of clock cycles. They use large hash tables to map a bitwise
context to a 1 byte bit history, and a smaller table to map that to a
prediction. The small table fits in cache, but the large table is designed
so that 3 consecutive hash bucket lookups on each of 4 consecutive bit
predictions all fit in the same 64 byte cache line. The table is allocated
and then aligned to a 64 byte address boundary using pointer arithmetic and
casts that is not possible in some languages.

Another example is the neural network mixers. In PAQ I wrote SSE2 assembler
code to do 4 multiply-accumulate operations or 4 weight updates on 32 bit
fixed point integers packed into 128 bit registers in a single instruction.
Again your language has to support calling external functions in x86. ZPAQ
uses pure C++ because compilers like g++ have gotten better at unrolling
loops and using SSE2/AVX to parallelize them. But you'll want to check by
looking at the assembler output, like with the g++ -S option. The ZPAQ spec
and libzpaq was designed to make this optimization straightforward


------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tcfc4df5e57c62b43-Mbde8b25f9b49d3452feb5175
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] Re: My data compressor is rising from the deep

Reply via email to