If you want to explain how GPT-2 works, to a beginner, easily, in no time, why not say it like this?:
Table of contents (compressors/ MIXERS): Syntactics BackOff Semantics More data Byte Pair Encoding etc Syntactics: Intro: Letters, words, and phrases re-occur in text. AI finds such patterns in data and **mixes** them. We don't store the same letter or phrase twice, we just update connection weights to represent frequencies. Explanation: If our algorithm has only seen "Dogs eat. Cats eat. Cats sleep. My Dogs Bark." in the past, and is prompted with the input "My Dogs" and we pay Attention to just 'Dogs' and require an exact memory match, the possible predicted futures and their probabilities (frequencies) are 'eat' 50% and 'Bark' 50%. If we consider 'My Dogs', we have fewer memories and predict 'Bark' 100%. The matched neuron's parent nodes receive split energy from the child match. BackOff: A longer match considers more information but has very little experience, while a short match has most experience but little context. A summed **mix** predicts better, we look in memory at what follows 'Dogs' and 'My Dogs' and blend the 2 sets of predictions to get ex. 'eat' 40% and 'Bark' 60%. etc ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T4b1d3285aba79521-Mdaeeeb64a0ce9e33ec58a44d Delivery options: https://agi.topicbox.com/groups/agi/subscription
