Maybe people here know the answer. Searching google, emails, reddit post, youtube replies, and chat questions later, no one still after 4 days has answered....
I tried running others code too... You'd have to look at many others codes and they are not small codes. https://paperswithcode.com/sota/language-modelling-on-wikitext-2 The link above does not specify if we predict words (always separated by a space) or parts of words, which is the right way? If parts, what BPE method do I use? This would make results uncomparable if I don't predict the right things and the right amount of things. Predicting letters give me a Perplexity of about 2 because letters are easier to predict. BTW predicting letters make prediction worse, and you can't see that unless use the Hutter Prize evaluation. Do I predict spaces?.....Commas? Periods?......<UNK>? <eos>? This makes the Hutter Prize and the Large Text Compression Benchmark look 5x more than already was like shining gold compared to Perplexity benchmarks. Without strict rules and FAQ and people that reply back, Perplexity is a breeding ground for papers that want to pass a grade by saying my algo got 5 more points down that some other SOTA algo, without explaining how they got that score. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tc9a99c50a9ec758e-M585a45b1a2857e9c166fe44b Delivery options: https://agi.topicbox.com/groups/agi/subscription
