Hehe. Here is my whole plan below in clear form for all readers. Any feedback is helpful.
BTW Latest code, scores, text completions, and explanation of all the code can be found here > https://encode.su/threads/3595-Star-Engine-AI-data-compressor New Goal: Translation ability will allow for recognition of similar contexts so my AI can utilize extra seen words that came next ex. dog>meowed, and will allow advanced priming ability ex. cat dog pig horse ___. Plan: In a dataset of ex. text, similar features stick together ex. an article is on how to care for dogs, because humans write using the priming ability. To learn related words/phrases/letters, you want to look around the one of interest on both sides, closer is more impactful, because related words stick near each other! We ignore rare and too common features since they pop up everywhere or cost resource explosion. Another way to find related features is an indirect context path ex. 'dog' and 'cat' say are not seen near each other - however - you see "cat ate.........5GBs later.......dog ate", and so they predict the same thing, and therefore have a similar meaning. You can also do hole and delay matching to find related features ex. 'cat ate tuna on bed' and see 'dog ate kibble on bed' and the lengthy match proves a closer relation. Both methods ignore rare/common features, give more impact if closer on either side of the feature, and need enough observations, and ex. for method2 cat/dog needs to share enough predictions out of the Total Contexts they have follow - they'll need more if there is more competing possible relations ex. if cat is near only 4 other words ever and you have seen cat 500 times, then you are sure which is closer related. Conclusion: I'm going to first try method2 because it seems like it would contextually find related features better, it uses predictions, exactly what we want to get precise related features. Just being near each other is powerful too, like the priming ability, it is the priming ability !, however this is only a nice enhancement and is tar dead meat without a LOT of data and compute, we will use method2; context predictions to be our markochaino guide mainly. Implementation plan: I will store in a trie tree contexts, and will take a given 2 words to check for similarity. I'll see how many 'word' contexts both have on the righthand side - they have enough experience if I see ex. cat has 5 types of words follow and in total has seen 500 words follow cat, combined with dog's such analysis. I'll take those contexts and see if they share all those contexts of total they have. This will tell me how similar they are and how confident it is of this answer. I'll ignore rare and common words by not considering them at all in the described steps, by seeing at the root area of the trie tree ex. 'the' has a count of 500 times and we have seen so far 1,000 words, "wow 'the' is really common in my experience" is what it will see. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T192296c5c5a27230-Mfe20849a30268080aef43b0e Delivery options: https://agi.topicbox.com/groups/agi/subscription
