I don't think you can prove Perplexity is better than Lossless Compression for 
AI prediction evaluation. The Hutter Prize and Matt Mahoney's Large Text 
Compression Benchmark have a lot of scores to compare to but not as many, but 
it tells you your score more firmly. I tried asking yesterday how Perplexity 
works, and its issue isn't just test/validation set data leaking, it has a 
other issue that is actually a problem unlike the leaking one which may not be 
a big problem or usually occur to you; in Perplexity you have to get the 
average prediction accuracy for each item in the test set, great, but you have 
to normalize it using the total counts of all tokens in the test set so it is 
dataset size invariant.....this doesn't seem good but may be ok like the data 
leak possibleissue issue.....also the fact that I predict 1-3 letters and they 
predict tokens means my average would have more samples and would be easier to 
predict letter than a word (much hard for them), and my norm would be totally 
different too, so I CAN'T compete unless my Byte Pair Encoder is the same. I 
use Byron Knoll's cmix pre-processor, which is close, maybe it is BPE. Have to 
check still.

SO: Perplexity has more scores to compare to than my 
not-well-known-but-otherwise-Best-benchmark(HP&LTCB), it has less scores to 
compare to because if you work with BPE than letter ones can't compare scores 
(you may have half or none scores left to compare to! ouch!), it is not bullet 
proof.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T31c4c6495649906f-M33b160490acd421b5623e8e1
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to