Marcus Hutter implicitly addresses perplexity in this Hutter Prize FAQ
entry:

http://www.hutter1.net/prize/hfaq.htm#xvalid
Why aren't cross-validation or train/test-set used for evaluation?A common
way of evaluating machine learning algorithms is to split the data into a
training set and a test set, learn e.g. the parameters of a Neural Network
(NN) on the train set and evaluate its performance on the test set. While
this method, and similarly its extension to cross-validation, can work in
practice, it is not a fool-proof method for evaluation: In the training
phase, the algorithm could *somehow* manage to "effectively" store the
information contained in the test set and use it to predict the test set
without the desired generalization capability. This can happen in a number
of ways:

   1. The test set could be very *similar* or in the extreme case identical
   to the train set, so even without access to the test set, the algorithm has
   effectively access to the information in the test set via the train set.
   For instance, if you downloaded all images from the internet and randomly
   split them into train and test set, most images would be in both sets,
   since most images appear multiple times online. Similarly if you download
   all text. Admittedly, Wikipedia should be less prone to repetition, since
   curated.
   2. The algorithm could *accidentally* contain test set information,
   though statistically this is very unlikely, and would only be a problem if
   HKCP received an enormous number of submissions, or contestants optimize
   their algorithms based on test-set performance.
   3. The contestant could *cheat* and simply hide the test set in the
   algorithm itself. This could be circumvented by keeping the test set
   secret, but one could never be sure whether it has leaked, a grain of doubt
   will always remain, and even if not, ...
   4. if the test set is taken from a *public* source like Wikipedia, a
   gargantuan NN could be trained on all of Wikipedia or the whole Internet.
   Limiting the size of the decompression algorithm can prevent this. Indeed
   this is the spirit of the used compression metric.

One the other hand, including the size of the decompressor rules out many
SOTA batch NN, which are often huge, but maybe they only *appear* better
than HKCP records, due to some of (a)-(d). The solution is to train online
<http://www.hutter1.net/prize/hfaq.htm#largnn> or to go to larger corpora
that are a more comprehensive sample of human knowledge.





On Sun, Mar 22, 2020 at 6:46 AM <[email protected]> wrote:

> Also see the below link about Perplexity Evaluation for AI! As I said,
> Lossless Compression evaluation in the Hutter Prize is *the best* and see
> it really is the same thing, prediction accuracy. Except it allows errors.
>
> https://planspace.org/2013/09/23/perplexity-what-it-is-and-what-yours-is/
>
> https://www.youtube.com/watch?v=BAN3NB_SNHY
>
> Hmm. I assume they take words or sentences and check if the prediction is
> close/exact, then carry on. With lossless compression, it stores the
> arithmetic encoded decimal of the probability and the resulting file size
> shows the probability error for the whole file, no matter if your predictor
> did poor on some or not, as well, just like Perplexity. However they don't
> consider the neural network size, it could just copy the data. That's why
> they use a test set after/during training. The goal is same, make a good
> neural network predictor though. The test set/compression is also, similar
> a lot, they are seeing how well it understands the data while not copying
> the data directly.
>
> So which is better? I'm not sure now. Perplexity, or Lossless Compression?
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/T2a0cd9d392f9ff94-Mdd8c32dae7701a14c4a1485d>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T2a0cd9d392f9ff94-M3e2aee20c3e5fca760ef4fcc
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to