There has to be a theory of understanding, reasoning, and judging at a
minimum underlying an aspiring AGI design. There is always going to be
a certain trick bag in any AI. If it looks like it is mostly a bag of
tricks, even though they might be REALLY REALLY GOOD tricks, it won't
get you to general AI. But, some people will believe it is AGI.

On 7/31/20, immortal.discover...@gmail.com
<immortal.discover...@gmail.com> wrote:
> Because it seems GPT-2/3 must be using several mechanisms like the ones that
> follow else it has no chance at predicting well:
> 
> P.S. Attention Heads isn't listed below, that's an important one, it can ex.
> predict a last name accurately by only looking at certain words regardless
> of all others ex. "[Jen Cath] is a girl who [has a mom] named [Tam]
> **Cath**" ....Tasks=something to do with where it looks, in which order,
> manipulation, etc...
> 
> ---Compressors/MIXERS---
> 
> Syntactics:
> Intro: Letters, words, and phrases re-occur in text. AI finds such patterns
> in data and **mixes** them. We don't store the same letter or phrase twice,
> we just update connection weights to represent frequencies.
> Explanation: If our algorithm has only seen "Dogs eat. Cats eat. Cats sleep.
> My Dogs Bark." in the past, and is prompted with the input "My Dogs" and we
> pay Attention to just 'Dogs' and require an exact memory match, the possible
> predicted futures and their probabilities (frequencies) are 'eat' 50% and
> 'Bark' 50%. If we consider 'My Dogs', we have fewer memories and predict
> 'Bark' 100%. The matched neuron's parent nodes receive split energy from the
> child match.
> 
> BackOff:
> A longer match considers more information but has very little experience,
> while a short match has most experience but little context. A summed **mix**
> predicts better, we look in memory at what follows 'Dogs' and 'My Dogs' and
> blend the 2 sets of predictions to get ex. 'eat' 40% and 'Bark' 60%.
> 
> Semantics:
> If 'cat' and 'dog' both share 50% of the same contexts, then maybe the ones
> they don't share are shared as well. So you see cat ate, cat ran, cat ran,
> cat jumped, cat jumped, cat licked......and dog ate, dog ran, dog ran.
> Therefore, probably the predictions not shared could be shared as well, so
> maybe 'dog jumped' is a good prediction. This helps prediction lots, it lets
> you match a given prompt to many various memories that are similar worded.
> Like the rest above, you mix these, you need not store all seen sentence
> from your experiences. Resulting in fast, low-storage, brain. Semantics
> looks at both sides of a word or phrase, and closer items impact it's
> meaning more.
> 
> Byte Pair Encoding:
> Take a look on Wikipedia, it is really simple and can compress a hierarchy
> too. Basically you just find the most common low level pair ex. st, etc,
> then you find the next higher level pair made of those ex. st+ar....it
> segments text well showing its building blocks.
> 
> More Data:
> Literally just feeding the hierarchy/ heterarchy more data improves its
> prediction accuracy of what word/ building block usually comes next in
> sequence. More data alone improves intelligence, it's actually called
> "gathering intelligence". It does however have slow down at some point and
> requires other mechanisms, like the ones above.
> 
> I have ~16 of these that all merge data to improve prediction.... You merge
> to e-merge insights
> 
> Any AGI will have these....

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M04b16f5daad12f3c38d0af2c
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to