[agi] Here's exactly why predicting words is better than letters

immortal . discoveries Sun, 29 Aug 2021 08:08:07 -0700

Say the AI has seen:
walked fast
walked funny
walked slow

Say you turn off Learning and are presented with "walked" now, 3 times, and 
must code/score its predictions. The text it runs over is "walked fast, walked 
slow, walked funny", to score it. If it predicted letters, it would be f/s 
50/50 % both as likely are next and rightfully so, then the rest it aces. But 
hold up, we are on fast still, we spend another 0.5 storage on predicting a/u! 
1.0 so far spent! Now the next two words: 0.5, and 1.0 again. Total spent 
predicting letters: 2.5. Now, if we predicted words, we would have 3 choices (3 
"letters" to choose of!, which are easily storable/accessible in a tree), so 
for each word in the string above we spend: 0.66% incorrect, again, and once 
more: total wrong: 1.98! Versus 2.5.


Viewing this in a generative free_mode to complete text, you can see "walked>?" 
would now rightfully so predict fast/sunny/slow evenly 0.33% the time each, 
unlike letter prediction: it would 50% the time predict the f or s at the 
start, then IF went to f, again 50% the time would predict a/u, meaning half of 
the 50% the time it goes to funny and fast i.e. fast 25% the time is chosen, 
same for funny, and slow 50%.

So you can predcit letters as long as it spits out more than 1 letter when 
should....but this is the same thing as predciting multiple letters at a time.


But then how does it make new words up then? Say it knows rewash, washing. So 
how does it get rewashing? It could do phrases, but then it limits its 
creativity ability.

and we quickly walked
and we did peek
and we went to

Say you see and we, and predict one the 3 choices, because usually that's all 
we saw, 3 or 6 times, evenly. We can't assume such yet though. We may need to 
predict "and we > quickly slept". So we can once are experienced enough at that 
length. WE should use Byte Pair Encoding to learn which are learnt parts, ex. 
"humans" and "dare not" are solid parts unchanging mostly. We could rarely 
predict one BPE part's letters to allow rare exploration once in a while.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T90b7756a48658254-M0144ac8b430c8946d210e72c
Delivery options: https://agi.topicbox.com/groups/agi/subscription

[agi] Here's exactly why predicting words is better than letters

Reply via email to