Yes stefan, if you saw 'cat ran' 3 times and 'cat slept' 1 time, your predictions for 'cat' are cat>(ran=75% likely, slept=25% likely). So you predict ran 75% of the time in Generate Mode, or if in Compress Mode then all the time you predict (ran=75% likely, slept=25% likely). There's no other way to know what to say after cat, cat what? You get your things that follow, they have their counts 3 and 1 times seen, and you give them a up to date % score - and to get their scores you take the counts cat ran (3) and cat slept (1) and get a % probability distribution 75% and 25%.
If you wanted to use Perplexity like everyone in the field does or Lossless Compression, you need to get the %s. Because in Perplexity you add up errors ex. cat>ran/slept is predicted 75% and 25%, and the true answer let's make up is cat>ran (here in the file it is let's say), so you have a 25% error to add to your error score! If you used the native easy way 3 and 1 counts per prediction, 6 and 2 is same but not normalized, 6 and 2 would give different error scores, which is untrue, it's still 75% and 25%. So you need it for evaluation at least 1 time done ya (and generate mode, I think, because you could mix (perhaps a sloppy way) predictions for c, ca, cat, without using %s, giving you a single set of counts, no %s, but maybe it will do it one time at the end to get %s). My line of code makes a set of %s into weights for Generate Mode: prediction = random.choices(predict[0], weights=(predict), k=1). It may be able to use counts, unsure if it converts it to %s on its own. So once yes to eval, possibly once to generate, and mine does it for each damn set of predictions for each context order, to know how much weight to give a set of weights. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tf856e4082d9ea09a-M0d694caddf46ef15ce2c288a Delivery options: https://agi.topicbox.com/groups/agi/subscription
