There has to be a theory of understanding, reasoning, and judging at a minimum underlying an aspiring AGI design. There is always going to be a certain trick bag in any AI. If it looks like it is mostly a bag of tricks, even though they might be REALLY REALLY GOOD tricks, it won't get you to general AI. But, some people will believe it is AGI.
On 7/31/20, immortal.discover...@gmail.com <immortal.discover...@gmail.com> wrote: > Because it seems GPT-2/3 must be using several mechanisms like the ones that > follow else it has no chance at predicting well: > > P.S. Attention Heads isn't listed below, that's an important one, it can ex. > predict a last name accurately by only looking at certain words regardless > of all others ex. "[Jen Cath] is a girl who [has a mom] named [Tam] > **Cath**" ....Tasks=something to do with where it looks, in which order, > manipulation, etc... > > ---Compressors/MIXERS--- > > Syntactics: > Intro: Letters, words, and phrases re-occur in text. AI finds such patterns > in data and **mixes** them. We don't store the same letter or phrase twice, > we just update connection weights to represent frequencies. > Explanation: If our algorithm has only seen "Dogs eat. Cats eat. Cats sleep. > My Dogs Bark." in the past, and is prompted with the input "My Dogs" and we > pay Attention to just 'Dogs' and require an exact memory match, the possible > predicted futures and their probabilities (frequencies) are 'eat' 50% and > 'Bark' 50%. If we consider 'My Dogs', we have fewer memories and predict > 'Bark' 100%. The matched neuron's parent nodes receive split energy from the > child match. > > BackOff: > A longer match considers more information but has very little experience, > while a short match has most experience but little context. A summed **mix** > predicts better, we look in memory at what follows 'Dogs' and 'My Dogs' and > blend the 2 sets of predictions to get ex. 'eat' 40% and 'Bark' 60%. > > Semantics: > If 'cat' and 'dog' both share 50% of the same contexts, then maybe the ones > they don't share are shared as well. So you see cat ate, cat ran, cat ran, > cat jumped, cat jumped, cat licked......and dog ate, dog ran, dog ran. > Therefore, probably the predictions not shared could be shared as well, so > maybe 'dog jumped' is a good prediction. This helps prediction lots, it lets > you match a given prompt to many various memories that are similar worded. > Like the rest above, you mix these, you need not store all seen sentence > from your experiences. Resulting in fast, low-storage, brain. Semantics > looks at both sides of a word or phrase, and closer items impact it's > meaning more. > > Byte Pair Encoding: > Take a look on Wikipedia, it is really simple and can compress a hierarchy > too. Basically you just find the most common low level pair ex. st, etc, > then you find the next higher level pair made of those ex. st+ar....it > segments text well showing its building blocks. > > More Data: > Literally just feeding the hierarchy/ heterarchy more data improves its > prediction accuracy of what word/ building block usually comes next in > sequence. More data alone improves intelligence, it's actually called > "gathering intelligence". It does however have slow down at some point and > requires other mechanisms, like the ones above. > > I have ~16 of these that all merge data to improve prediction.... You merge > to e-merge insights > > Any AGI will have these.... ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M04b16f5daad12f3c38d0af2c Delivery options: https://agi.topicbox.com/groups/agi/subscription