Predicting words but letters actually: Predicting words (as I explained why) is not only better apparently but also allows for related word priming. I may have found a way to predict like this using letter prediction. Remember my first post. So: we have the dataset fa fu sz, if we predict letters and go through each 3 times in a make-up test set, it is worse, we get 50% will be a f/s, so 50% it was wrong, we do that 3 times for 3 words in the test set, then out of 2 of those test words were started with f, so we get a/u 50% wrong 2 times, z is 100%, so total byte gotten wrong is 50%+50%+50% +50%+50% = 2.5, if we predict words we would see each word is 1/3rd the probability so would be 3 times simply we predict 33% chance hence 3 times we were 66% wrong, so 66%+66%+66% = 2.0. But here's how to so it to letters I think: let's change it to f=99%, s=1%, f>a/u 50%, now we get 2 times we had to pay 0.1%, once 99.9% when had to predict s correctly, then 50% 2 times, this gives us a have to pay of 0.1%+0.1%+99.9%+50%+50% = 2.001, O>O ! But why does that even work, what gives. The a/u makes sense, 50%/50%, but the f/s.......why 99.9999 and 0.0001....the complete elimination of the other makes it able to do 50/50% and such, i guess...
let's try aa, ab, ba, ba....word prediction: 75%, 75%, 50%, 50%...=2.5.....letter: 50% x4, 50% x2......=3.0..............................so it doesn't work on even distributions, but maybe these are rare, at least for early layers of a network....i'll think on it later if need then ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T90b7756a48658254-Mf9740a048c3bcbf341904f03 Delivery options: https://agi.topicbox.com/groups/agi/subscription
