Separating your dataset into a training set and testing set isn't so bad when 
you first look at it. It's like online learning, you try to predict next 
letter, train on it, repeat, and the train/test sets is the same way: train, 
then predict, except it only iterates that once. The goal of both is to get a 
(quickly) small code, small error correction arithmetic code, and small model 
network.

So why is test/train set way worse? Well first of all, say you trained on the 
train set and got your model, you throw away the train set, make the test code 
evaluation score, well, now you have to keep that model! With online learning 
you get a smaller compression, you don't need to "keep" the train set by 
keeping the train set or the trained model, you keep ONLY the evaluation code 
created and the first few letters and the AI code, that way you begin 
recreating the model like how it compressed it - to now decompress it, and this 
will soon give you a model like the one you had exponentially fast! So you get 
the model without storing it, cuz the "code" stores it instead. Also, the 
evaluation is evaluating a changing model as it goes from high learning rate to 
very low learning rate.

You can get a smaller file by doing the test the Hutter Prize way. The other 
way, you get a small code, a compression, and so is the model a compression, 
just not the exact file, but you needed it to make the code! Now to decompress 
that code, to show it is self contained and representative, you need the model. 
So to get that model you need either the large train set or the model, you 
can't get it as you evaluate in their way.

So my conclusion remains and as this now: The Hutter Prize method for 
evaluation is better because it makes a smaller file, checking the size of the 
AI code, the error correction arithmetic code, how fast to run it on 
compression, and as well how large the model gets (filling the AI code skeleton 
to plump it up with weights etc...) as runs.

Actually ?I think I get it now. Say we did their method and got a small 
arithmetic code, but the model was 100MB (all of enwik8 dataset), well that is 
a dumb AI, cheated, so now say we had a 30MB model, it is still dumb here too, 
but we MUST include it in with out arithmetic code ex. 30MB+10MB! So we can get 
rid of it by doing the Hutter Prize way > 0MB+13MB, the arithmetic code is only 
a bit bigger cuz your model starts off as baby mind but quickly gets close to 
the fully trained version.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tc47ee370e3f8bcb6-M558d3a91e891b7a41b5c47e8
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to