On Sat, Apr 11, 2020, 6:19 PM <[email protected]> wrote:
> pypy or Cython or CPython or Jython? Have you tried pypy? It says it's > still in experimentation stage. > Really you should learn C and C++ because that's what most data compression developers use. You need to be able to read their code. There is a lot of optimization that you can't do even in compiled Python or Java. For example, the indirect context models used extensively in PAQ and ZPAQ are designed to minimize cache misses because random memory access takes hundreds of clock cycles. They use large hash tables to map a bitwise context to a 1 byte bit history, and a smaller table to map that to a prediction. The small table fits in cache, but the large table is designed so that 3 consecutive hash bucket lookups on each of 4 consecutive bit predictions all fit in the same 64 byte cache line. The table is allocated and then aligned to a 64 byte address boundary using pointer arithmetic and casts that is not possible in some languages. Another example is the neural network mixers. In PAQ I wrote SSE2 assembler code to do 4 multiply-accumulate operations or 4 weight updates on 32 bit fixed point integers packed into 128 bit registers in a single instruction. Again your language has to support calling external functions in x86. ZPAQ uses pure C++ because compilers like g++ have gotten better at unrolling loops and using SSE2/AVX to parallelize them. But you'll want to check by looking at the assembler output, like with the g++ -S option. The ZPAQ spec and libzpaq was designed to make this optimization straightforward ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tcfc4df5e57c62b43-Mbde8b25f9b49d3452feb5175 Delivery options: https://agi.topicbox.com/groups/agi/subscription
