Why bother with a CIC training and test set? Compression evaluates every
bit as a test given the previous bits as training. Even if the compression
algorithm doesn't explicitly predict bits, it is equivalent to one that
does by the chain rule. The probability of a string is equal to the product
of the conditional probabilities of its symbols.

You can see this effect at work in http://mattmahoney.net/dc/text.html
The ranking of enwik8 (first 100 MB) closely tracks the ranking of enwik9.
Most of the variation is due to memory constraints. In small memory models,
compression is worse overall and closer to the result you would get from
compressing the parts independently.

Occam's Razor doesn't necessarily hold under constrained resources. All
probability distributions over an infinite set of strings must favor
shorter ones, but that isn't necessarily true over the finite set of
programs that can run on a computer with finite memory.

On Tue, Jul 7, 2020, 12:13 AM James Bowery <jabow...@gmail.com> wrote:

> On Fri, Jul 3, 2020 at 6:53 PM Ben Goertzel <b...@goertzel.org> wrote:
>
>> ...Under what conditions is it the case that, for prediction based on a
>> dataset using realistically limited resources, the smallest of the
>> available programs that precisely predicts the training data actually gives
>> the best predictions on the test data?
>
>
> If I may refine this a bit to head off misunderstanding at the outset of
> this project:
>
> The CIC* (Compression Information Criterion) hypothesis is that among
> existing models of a process producing an executable archive of the same
> training data within the same computation constraints, the one that
> produces the smallest executable archive will in general be the most
> accurate on the test data.
>
>
> Run a number of experiments and for each:
> 1 Select a nontrivial
> 1.1 computational resource level as constraint
> 1.2 real world dataset -- no less than 1GB gzipped.
> 2 Divide the data into training and testing sets
> 3 For each competing model:
> 3.1 Provide the training set
> 3.2 Record the length of the executable archive the model produces
> 3.3 Append the test set to the training set
> 3.4  Record the length of the executable archive the model produces
> 4 Produce 2 rank orders for the models
> 4.1 training set executable archive sizes
> 4.2 training with testing set executable archive sizes
> 5 Record differences in the training vs test rank orders
>
> The lower the average differences the more general the criterion.
>
> It should be possible to run similar tests of other model selection
> criteria and rank order model selection criteria.
>
> *We're going to need a catchy acronym to keep up with:
>
> AIC (Akaike Information Criterion)
> BIC (Bayesian Information Criterion)...
> ...aka
> SIC (Schwarz Information Criterion)...
> ...aka
> MDL or MDLP (both travestic abuses of "Minimum Description Length
> [Principle]" that should be forever cast into the bottomless pit)
> HQIC (Hannan-Quinn Information Criterion)...
> KIC (Kullback Information Criterion)
> etc. etc.
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/Ta901988932dbca83-M38f2416839b2c191952a0332>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta901988932dbca83-Md0356e370e6fb9b9f01847eb
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to