GPT-2 also uses frequency for sure for next word probabilities. And probably 
something equivalent to SSE/SEE (Secondary Symbol Estimation) which is used in 
the best lossless compressors and shaves off 22MB to 20MB, world record is 
14.8MB for 100MB input.

Also the best matched strings are mixed, "the dog" activated "the dog"..."he 
dog"...."t dog"......"this cat"...."cat this"...."but cats"....they are similar 
and some are longer and some are more activated - especially longer ones. So 
the predictions are voted on based on frequency, combined energy transfer 
especially from longer/ more activated strings, and one day reward too, and if 
the frequency etc and the total count of diverse predictions is too low then it 
Backoffs a string length down to get higher frequency etc and so the shorter 
matches mat get 40% needed and the first set mixed got 10%, you mix these sets 
of prediction probabilities together.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0607d3f3f3678b2f-Mebcf09a30e09d161fc96c751
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to