I just realized something that [may] be true, as much as shockingly amazing at 
how useful the Lossless Compression contest is, I [might] be able to learn my 
relations fast if I only do it once, instead of 100 times throughout the 
enwik8/9 dataset. The reason for doing it 100 times or just simply 10 times may 
pass as good enough is because for Lossless Compression you decompress out data 
and train on it, you only then start growing a relational net fast, 1MB in it 
is already helpful, you would not want to store any of the relational net 
before decompression, not even 1MB which would hurt your score. This would mean 
I could not show off as well in the Lossless Compression contest, if I'm 
correct, because doing my word2vec-like thing 10 times is 10 times slower. If I 
did no eval or just a simple test like train on 80% of the dataset, then tally 
up the prediction error on the other 20% of the dataset, with no other weird 
things like dataset size normalization which is done in Perplexity evaluation 
for some reason that makes no sense to me, then I could see my score just as 
well maybe, just it may be for token or letter prediction only and not 
comparable to other predictors that predict different sized chunks of text at a 
time.

If I don't rebuild my relational network, I think it may take me only 20 mins 
for 1 CPU 1 core on 1GB of data in Pypy version of Python (~2.6 times slower 
than C), because the most intensive part is going to be searching the trie tree 
for ex. 20 words that follow after dog to see if cat has them follow (40 
searches total) and that comes out to the time I calculated. I do find it 
interesting how my trie tree net to train takes just 40 mins for 1GB.

But doing it 10 times slower isn't bad I guess for now on the LC contest. 200 
mins= 3.333 hours. Though constantly having to predict takes a lot of time for 
1GB lol. I think a fast sanity test is good like text completion, I must 
remember my power for my AI is more when take off the contest paradigm.

You may say what how can it do it? Well a trie tree can store 1GB efficiently, 
then learning a relational net can be done using that tree, no need to update 
cat<>dog relations (unless use learnt relations as evidence of course). 
Re-building the relational net could be done, kinda as costly as updating it, 
but with a full learnt tree, no need to do it. You may wonder why can't I do 
this using the trie tree too, before use it to build my relational net. Hmm, 
well, that would involve taking ex. 10 sentences and sticking them at the same 
time into 1 tree, how can I do that though? It's like a batch training method, 
but, hmm, I don't think you can actually do that in the meant sense ex. try it 
for training a markov chain. The one way to do it though ya yes is making a 
small tree ex. you saw walk, later again walk, later walking, and waked, 
walked.....so now you go into the tree and store TWO counts etc for each 
feature. How does this help maybe? You still increment every observation, but 
as you search the small temp tree it has less to search through, so upon 
dumping it into real tree/ big tree, you take the longer search time but only 
do it once for the node 'walked'.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T01fa5e447808d368-M39462e1bd697b620a13cde76
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to