The paper mentioned improving compression of enwik8 from 0.99 to 0.93 bits per character but gives no details or citation. enwik8 is from my large text benchmark and is the test file for the Hutter prize. The current record is actually 1.22 bits per character and I haven't received an entry from them. I am on the prize committee.
text8 is a clean version of enwik8 with only lowercase letters and spaces. enwik8 is 100 MB of Wikipedia text with some XML formatting. On Thu, Feb 14, 2019, 5:28 PM Robert Levy <[email protected] wrote: > https://blog.openai.com/better-language-models/ > > Impressive work. They're use the technique introduced in the "Attention Is > All You Need" paper called "transformers". See also: > http://jalammar.github.io/illustrated-transformer/ > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/T709a492ffd52fb84-Me16990cef63bf096e9e20452> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T709a492ffd52fb84-M35a1e191bf61bcdb8fb065f6 Delivery options: https://agi.topicbox.com/groups/agi/subscription
