Maybe people here know the answer. 

Searching google, emails, reddit post, youtube replies,  and chat questions 
later, no one still after 4 days has answered....

I tried running others code too... You'd have to look at many others codes and 
they are not small codes.

https://paperswithcode.com/sota/language-modelling-on-wikitext-2

The link above does not specify if we predict words (always separated by a 
space) or parts of words, which is the right way? If parts, what BPE method do 
I use? This would make results uncomparable if I don't predict the right things 
and the right amount of things. Predicting letters give me a Perplexity of 
about 2 because letters are easier to predict. BTW predicting letters make 
prediction worse, and you can't see that unless use the Hutter Prize evaluation.

Do I predict spaces?.....Commas? Periods?......<UNK>? <eos>?

This makes the Hutter Prize and the Large Text Compression Benchmark look 5x 
more than already was like shining gold compared to Perplexity benchmarks. 
Without strict rules and FAQ and people that reply back, Perplexity is a 
breeding ground for papers that want to pass a grade by saying my algo got 5 
more points down that some other SOTA algo, without explaining how they got 
that score.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tc9a99c50a9ec758e-M585a45b1a2857e9c166fe44b
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to