If you take 2GB of diverse domain text and lower your memory cost to the 
maximum possible while still able to losslessly regenerate/extract it all back 
very fast, you learn general patterns, but you can learn more and better ones 
if find and eat more data. If you overfit on the training data and can 
regenerate it all back perfectly digitally, it is not bad, you have learnt 
general patterns very well that lead you to be able to compress it so well. Of 
course, if your training data is only 2GB and is all about how to fix a car, it 
will be optimal but that optimalness will be not that general until it eats 
more data by ex. Online Learning if you don't offline it on diverse 800GB right 
away. So the takeaway here is more diverse data is better, the only way to 
screw up on learning general patterns is if you have too small or non-diverse 
data. Also, as it eats data, it may not always learn new better patterns, hence 
the 50% curves of success happen. These 50% curves are made of 50% curves, 
repeat. I expect the error loss curve to be a fractal as it lowers. You can 
actually see this in their graph.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tc1f2c133ae3e4762-M4eeb30da7d31999a16813769
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to