One of the only differences if I try vision is instead of me storing/matching a group of symbols ex. thanksgivin>g and get the next predicted letter like that, I must do so in 2 dimensions since images are 2D. Not hard to do so far. Also, instead of thanksgivin>g, it would look like (if we had 10 types of pixel shades i.e. 0-9): 361736472>5. Now, 5 is similar to 6 in brightness, and if all are 2 shades brighter then they really is no error it is still a image of a turkey just brighter. So if we see a turkey again, it will be 'similar' to that string of numbers or same just all 2 shades brighter. So that is not so ambitious to code in too. You'll also want to do rotation/ scale/ location like this too using the described relative mismatch pattern. The relativeness function actually works in text too, you may see h e l l o. SO: I'd need to add 2D and similar delay for pixels, I actually already have the delay working just it is in time/proximity for image/sequence and not pixel self identification brightness level. So, lots here is familiar already really, not much change to the cod eis needed really...
So turning my text predictor into an image predictor isn't that hard once it gets like GPT, I think. Also I had been thinking of throwing away image data except lines as since if there is lines on the sides then the inside is the same shade, lines show the change of shade, but, one problem, it seems confusing and similar performance to just using and predicting all the pixels. I mean you need to output them all anyway, the other way you'd simply fil in holes with shade, and some may be open holes that graduate in shade, which seems tough to fill in! Or maybe I'm overlooking something. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T5b614d3e3bb8e0da-Me6ea1e6105d9451419976755 Delivery options: https://agi.topicbox.com/groups/agi/subscription
