Marcus Hutter implicitly addresses perplexity in this Hutter Prize FAQ entry:
http://www.hutter1.net/prize/hfaq.htm#xvalid Why aren't cross-validation or train/test-set used for evaluation?A common way of evaluating machine learning algorithms is to split the data into a training set and a test set, learn e.g. the parameters of a Neural Network (NN) on the train set and evaluate its performance on the test set. While this method, and similarly its extension to cross-validation, can work in practice, it is not a fool-proof method for evaluation: In the training phase, the algorithm could *somehow* manage to "effectively" store the information contained in the test set and use it to predict the test set without the desired generalization capability. This can happen in a number of ways: 1. The test set could be very *similar* or in the extreme case identical to the train set, so even without access to the test set, the algorithm has effectively access to the information in the test set via the train set. For instance, if you downloaded all images from the internet and randomly split them into train and test set, most images would be in both sets, since most images appear multiple times online. Similarly if you download all text. Admittedly, Wikipedia should be less prone to repetition, since curated. 2. The algorithm could *accidentally* contain test set information, though statistically this is very unlikely, and would only be a problem if HKCP received an enormous number of submissions, or contestants optimize their algorithms based on test-set performance. 3. The contestant could *cheat* and simply hide the test set in the algorithm itself. This could be circumvented by keeping the test set secret, but one could never be sure whether it has leaked, a grain of doubt will always remain, and even if not, ... 4. if the test set is taken from a *public* source like Wikipedia, a gargantuan NN could be trained on all of Wikipedia or the whole Internet. Limiting the size of the decompression algorithm can prevent this. Indeed this is the spirit of the used compression metric. One the other hand, including the size of the decompressor rules out many SOTA batch NN, which are often huge, but maybe they only *appear* better than HKCP records, due to some of (a)-(d). The solution is to train online <http://www.hutter1.net/prize/hfaq.htm#largnn> or to go to larger corpora that are a more comprehensive sample of human knowledge. On Sun, Mar 22, 2020 at 6:46 AM <[email protected]> wrote: > Also see the below link about Perplexity Evaluation for AI! As I said, > Lossless Compression evaluation in the Hutter Prize is *the best* and see > it really is the same thing, prediction accuracy. Except it allows errors. > > https://planspace.org/2013/09/23/perplexity-what-it-is-and-what-yours-is/ > > https://www.youtube.com/watch?v=BAN3NB_SNHY > > Hmm. I assume they take words or sentences and check if the prediction is > close/exact, then carry on. With lossless compression, it stores the > arithmetic encoded decimal of the probability and the resulting file size > shows the probability error for the whole file, no matter if your predictor > did poor on some or not, as well, just like Perplexity. However they don't > consider the neural network size, it could just copy the data. That's why > they use a test set after/during training. The goal is same, make a good > neural network predictor though. The test set/compression is also, similar > a lot, they are seeing how well it understands the data while not copying > the data directly. > > So which is better? I'm not sure now. Perplexity, or Lossless Compression? > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/T2a0cd9d392f9ff94-Mdd8c32dae7701a14c4a1485d> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T2a0cd9d392f9ff94-M3e2aee20c3e5fca760ef4fcc Delivery options: https://agi.topicbox.com/groups/agi/subscription
