I just realized something that [may] be true, as much as shockingly amazing at how useful the Lossless Compression contest is, I [might] be able to learn my relations fast if I only do it once, instead of 100 times throughout the enwik8/9 dataset. The reason for doing it 100 times or just simply 10 times may pass as good enough is because for Lossless Compression you decompress out data and train on it, you only then start growing a relational net fast, 1MB in it is already helpful, you would not want to store any of the relational net before decompression, not even 1MB which would hurt your score. This would mean I could not show off as well in the Lossless Compression contest, if I'm correct, because doing my word2vec-like thing 10 times is 10 times slower. If I did no eval or just a simple test like train on 80% of the dataset, then tally up the prediction error on the other 20% of the dataset, with no other weird things like dataset size normalization which is done in Perplexity evaluation for some reason that makes no sense to me, then I could see my score just as well maybe, just it may be for token or letter prediction only and not comparable to other predictors that predict different sized chunks of text at a time.
If I don't rebuild my relational network, I think it may take me only 20 mins for 1 CPU 1 core on 1GB of data in Pypy version of Python (~2.6 times slower than C), because the most intensive part is going to be searching the trie tree for ex. 20 words that follow after dog to see if cat has them follow (40 searches total) and that comes out to the time I calculated. I do find it interesting how my trie tree net to train takes just 40 mins for 1GB. But doing it 10 times slower isn't bad I guess for now on the LC contest. 200 mins= 3.333 hours. Though constantly having to predict takes a lot of time for 1GB lol. I think a fast sanity test is good like text completion, I must remember my power for my AI is more when take off the contest paradigm. You may say what how can it do it? Well a trie tree can store 1GB efficiently, then learning a relational net can be done using that tree, no need to update cat<>dog relations (unless use learnt relations as evidence of course). Re-building the relational net could be done, kinda as costly as updating it, but with a full learnt tree, no need to do it. You may wonder why can't I do this using the trie tree too, before use it to build my relational net. Hmm, well, that would involve taking ex. 10 sentences and sticking them at the same time into 1 tree, how can I do that though? It's like a batch training method, but, hmm, I don't think you can actually do that in the meant sense ex. try it for training a markov chain. The one way to do it though ya yes is making a small tree ex. you saw walk, later again walk, later walking, and waked, walked.....so now you go into the tree and store TWO counts etc for each feature. How does this help maybe? You still increment every observation, but as you search the small temp tree it has less to search through, so upon dumping it into real tree/ big tree, you take the longer search time but only do it once for the node 'walked'. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T01fa5e447808d368-M39462e1bd697b620a13cde76 Delivery options: https://agi.topicbox.com/groups/agi/subscription
