On Fri, Jan 31, 2020, 3:49 AM <[email protected]> wrote:

> that sounds like the cool way to do it :),   i do it the easy way and just
> use a binary key store...  and i get my compression by sharing sections of
> the keys.
>

Any benchmark results? I would be interested if it improves compression.

Compression is a highly experimental process. Most of the stuff I tried
either didn't work or resulted in tiny improvements. In earlier versions
(paq6) I mixed contexts by counting 0s and 1s and mixed them by weighted
additions of the counts. Even this was good enough to win the Calgary
compression challenge, beating out PPM as the best known algorithm.

Logistic mixing turned out to work even better and is simpler. The update
rule is gradient descent in weight space and is simpler than back
propagation because it minimizes coding cost instead of root mean square
error. Instead of w += Lx(b-p)(p)(1-p) (where p is the output probability,
b is the predicted bit, x is the input, and L is the learning rate), when
you take the partial derivative of the entropy, the terms (p)(1-p) go away.

But there is a lot more to text compression than that. You have whole word
contexts and sparse contexts that skip bits, characters, or words. You have
match models that search for long matching contexts and predict whatever
followed with a weight proportional to the match length.

You can model the same contexts in different ways. For example, if you
observe a sequence like 0000000001 in some context, what is the next bit?
Fast and slow adapting context models will give different predictions,
which you can mix. Paq makes extensive use of indirect context models where
the sequence is mapped to a table of 0 and 1 counts to get the answer from
the actual data.

You can also play with the mixers. You can use a small (8-16 bit) context
to select the mixer weights and build a tree of mixers with different
contexts and learning rates with the context models at the leaves and final
prediction at the root. Paq also tunes the prediction using SSE (secondary
symbol estimators), which is a table that maps a small context and a
prediction to a new prediction, and mixes them too.

The best compressors preprocess the input by using special symbols to
indicate upper case letters and a dictionary that maps common words to
symbols. The dictionary is organized to group semantically and
syntactically related words so that bitwise sparse models can recognize the
groups.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T409fc28ec41e6e3a-Me2c30d70e9480036dabaf6a6
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to