Iv'e been coding my own text predictors and researching them hard and I notice 
the key is frequencies of what usually comes next after your context [th] ex. 
you see [e] has high count observations. But you want the past window to be 
very long and grab your entail frequencies from those recognized features ex. 
[my dog ate the][k]. Which allows good letter prediction. GPT-2 predicts BPE 
words. You actually mix multiple windows because longer context matches are 
rarer but you can at least get some frequencies of what's seen to follow them. 
To help find longer matches you'd want to "translate" words cat=dog, accept 
different position appearance, and focus on rare words to summarize the 
context, so that when you look at the past 30 words you can find multiple 
matches in memory even though there is of course no experience exacting 
matching it - the alternative words, position, and filler content is in there 
but is similar or doesn't matter. So in the end, frequencies is running it, and 
even the recognition cat=dog is based on discovered shared contexts, based on 
frequencies. Probabilities run it and if a match is not exact then it's 
predictions will all get less weight.

What I show in the video appears to helps prediction by making the predictions 
more "similar" to the *rare story words (especially more recent words), it can 
look at ALL the past context. The main prediction in these algorithms however 
is from looking at the past ex. 3 or 20 words to get multiple "similar matches" 
to see what usually follows the matched contexts. You can look farther back if 
you 1) attend to only rare words and ignore ex. 'the', 2) can use similar words 
'cat/dog', and 3) use similar position.

When you know "Paris is the capital of France" and see a new prompt "The 
capital of France is " you predict Paris with o-k accuracy because the context 
mostly matches this (and a few other things in your brain), and the 2 words 
that are switched around exist but with similar positions.

A good question is, do story words actually take their own vote on the 
prediction candidates? Or do we only use context matches to see what does come 
next? Well, if I keep adding the word 'cat' to the start of my prompt, it makes 
'cat' more probable, inch by inch the probability rises, which would be 
unlikely that matches are finding this is what commonly follows. Below is a new 
video testing it out to see if the prediction is influenced from context 
matches solely or if it does actually use as well all story words to mindlessly 
vote on the next word (if the input is all cat, it's likely it will continue 
saying 'cat' or the similar).

https://www.youtube.com/watch?v=kF8U2FD9JXc&feature=youtu.be

I could try in Allen these inputs: 'bat' or 'bat bat' or 'bat bat bat', or, 
'wind' or 'wind wind' or 'wind wind wind'....and no matter the word used, it 
will predict the same word, with more probability the more times it occurs. In 
the dataset it trained on is only briefly similar phrases, and I don't think 
they predict the same word that occurs in them. Yes my input matches them more 
and more because of similar words and hence the prediction will be similar, 
but, I don't feel out of 40GB there is enough "matches" to achieve that.

Keep in mind it predicts the *same* word, you'd think 'bat bat bat bat bat bat' 
would match things like 'my bat saw a bird on a bat but bats fly in bat caves' 
etc and would often predict only similar words like cave or bird....how many 
matches could you get that incrementally improve the prediction of 'bat'!? 
Impossible.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T1f56a0e2e53cf50a-Mf608913b5a48d9b2ecec47bb
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to