Separating your dataset into a training set and testing set isn't so bad when you first look at it. It's like online learning, you try to predict next letter, train on it, repeat, and the train/test sets is the same way: train, then predict, except it only iterates that once. The goal of both is to get a (quickly) small code, small error correction arithmetic code, and small model network.
So why is test/train set way worse? Well first of all, say you trained on the train set and got your model, you throw away the train set, make the test code evaluation score, well, now you have to keep that model! With online learning you get a smaller compression, you don't need to "keep" the train set by keeping the train set or the trained model, you keep ONLY the evaluation code created and the first few letters and the AI code, that way you begin recreating the model like how it compressed it - to now decompress it, and this will soon give you a model like the one you had exponentially fast! So you get the model without storing it, cuz the "code" stores it instead. Also, the evaluation is evaluating a changing model as it goes from high learning rate to very low learning rate. You can get a smaller file by doing the test the Hutter Prize way. The other way, you get a small code, a compression, and so is the model a compression, just not the exact file, but you needed it to make the code! Now to decompress that code, to show it is self contained and representative, you need the model. So to get that model you need either the large train set or the model, you can't get it as you evaluate in their way. So my conclusion remains and as this now: The Hutter Prize method for evaluation is better because it makes a smaller file, checking the size of the AI code, the error correction arithmetic code, how fast to run it on compression, and as well how large the model gets (filling the AI code skeleton to plump it up with weights etc...) as runs. Actually ?I think I get it now. Say we did their method and got a small arithmetic code, but the model was 100MB (all of enwik8 dataset), well that is a dumb AI, cheated, so now say we had a 30MB model, it is still dumb here too, but we MUST include it in with out arithmetic code ex. 30MB+10MB! So we can get rid of it by doing the Hutter Prize way > 0MB+13MB, the arithmetic code is only a bit bigger cuz your model starts off as baby mind but quickly gets close to the fully trained version. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tc47ee370e3f8bcb6-M558d3a91e891b7a41b5c47e8 Delivery options: https://agi.topicbox.com/groups/agi/subscription
