PPM predicts a byte at a time. Humans predict a word at a time. You can always convert bit predictions to longer symbols by multiplying the probabilities. I worked with bit predictions because it simplified the mixing and arithmetic coding.
To code a word prediction, you have to calculate the probability distribution over the entire vocabulary. Then you do a binary search to find the new range. It ends up being the same complexity as bit coding. On Mon, May 24, 2021, 2:26 PM <[email protected]> wrote: > BTW I asked Fabrice and he said his incredible compressor doesn't predict > the next bit! Meaning I can at least get there without doing so LOL!! He > said "All NNCP versions predict the next token which is usually a word or > subword.". > > And his articles also state no mention of bit prediction, I checked using > my browser's Find tool. Furthermore GPT doesn't predict the next bit from > what I know. > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/Tf856e4082d9ea09a-M15c044be2268e5d639f0a9cf> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tf856e4082d9ea09a-M395a3a9adf1e67340a154f43 Delivery options: https://agi.topicbox.com/groups/agi/subscription
