Re: [agi] if accurately predicts, it should more?

Matt Mahoney Wed, 30 Sep 2020 17:12:54 -0700

Oops the PAQ stretch transform is x = ln(p/(1-p)) = ln(p) - ln(1-p). These
predictions are combined by weighted summation x = SUM_i w_i x_i and
squashed by the inverse transform p = 1/(1 + e^-x). The weights are
adjusted to reduce the prediction error: w_i += L x_i (b-p), where b is the
predicted bit (0 or 1), 0 < p < 1 is the prediction, and L ~ .001 is the
learning rate for the neural network mixer. The weights follow a gradient
descent in coding cost space, which is simpler than reducing RSME error as
in back propagation by dropping the factors p(1-p).

PAQ uses indirect context models to solve the problem of how to model bit
sequences like 0000000001 appearing in some context (hash of the last n
bytes). The solution is to map the context to a bit history, then map the
history to an adaptive prediction. PAQ mixes direct and indirect
predictions for various n and might even use a small context to select the
vector of mixing weights.

Another mixing technique is the ISSE (indirect secondary symbol estimator)
chain. Each element in the chain maps an order n bit history (in increasing
n) to a 2 input mixer where one input is the previous ISSE prediction and
the other is the constant 1. The final output can be used directly or mixed
further.

PAQ's good compression comes from the ability to easily mix specialized
contexts such as whole words for text or 2 dimensional contexts for images
or tables.

On Wed, Sep 30, 2020, 4:58 PM Matt Mahoney <[email protected]> wrote:

>
>
> On Sun, Sep 27, 2020, 10:35 PM <[email protected]> wrote:
>
>> Matt is there a such algorithm that benefits if you see it has accurately
>> predicted for the last 30 predictions, so it should again, hence you make
>> its next predictions more confident ?
>>
>
> Yes. Context models in data compressors express confidence in their
> predictions by giving probabilities close to 0 or 1 after many correct
> predictions. These get higher weights by averaging with other models after
> a stretching transform, ln(p)/ln(1-p). Also, mixers (neural networks) learn
> which models are most accurate and weight them more heavily.
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T9a31de2189d7ab2a-M274adbdf8f7c634995260bb1
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] if accurately predicts, it should more?

Reply via email to