This is the zero frequency problem. What is the probability of a novel symbol in the current context, like "cat" followed by a word other than "ran" or "slept"?
This was an active area of research at the byte level in PPM based compressors before they were overtaken by PAQ around 2006 and the problem went away. There are complex algorithms for learning it which you can read about in http://mattmahoney.net/dc/dce.html#Section_422 For text, you also want a model that learns grammar and semantics. This is already being done by the top PAQ based bit level predictors. The models learn semantic categories (dog, bark) and grammatical categories (dog, cat) by clustering in long and short range context space, respectively. This can be done when building preprocessing dictionaries so that related words share code bits, although some dictionaries are also partially hand coded. Then the model predicts codes based on category contexts by masking the other bits. On Tue, May 25, 2021, 2:33 PM <[email protected]> wrote: > Yes stefan, if you saw 'cat ran' 3 times and 'cat slept' 1 time, your > predictions for 'cat' are cat>(ran=75% likely, slept=25% likely). So you > predict ran 75% of the time in Generate Mode, or if in Compress Mode then > all the time you predict (ran=75% likely, slept=25% likely). There's no > other way to know what to say after cat, cat what? You get your things that > follow, they have their counts 3 and 1 times seen, and you give them a up > to date % score - and to get their scores you take the counts cat ran (3) > and cat slept (1) and get a % probability distribution 75% and 25%. > > If you wanted to use Perplexity like everyone in the field does or > Lossless Compression, you need to get the %s. Because in Perplexity you add > up errors ex. cat>ran/slept is predicted 75% and 25%, and the true answer > let's make up is cat>ran (here in the file it is let's say), so you have a > 25% error to add to your error score! If you used the native easy way 3 and > 1 counts per prediction, 6 and 2 is same but not normalized, 6 and 2 would > give different error scores, which is untrue, it's still 75% and 25%. > > So you need it for evaluation at least 1 time done ya (and generate mode, > I think, because you could mix (perhaps a sloppy way) predictions for c, > ca, cat, without using %s, giving you a single set of counts, no %s, but > maybe it will do it one time at the end to get %s). My line of code makes a > set of %s into weights for Generate Mode: prediction = > random.choices(predict[0], weights=(predict), k=1). It may be able to use > counts, unsure if it converts it to %s on its own. So once yes to eval, > possibly once to generate, and mine does it for each damn set of > predictions for each context order, to know how much weight to give a set > of weights. > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/Tf856e4082d9ea09a-M0d694caddf46ef15ce2c288a> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tf856e4082d9ea09a-Mcb0303b5f7a493b949413d81 Delivery options: https://agi.topicbox.com/groups/agi/subscription
