On Thu, Jul 6, 2023 at 3:51 AM Matt Mahoney <[email protected]> wrote: > > I am still on the Hutter prize committee and just recently helped evaluate a > submission. It uses 1 GB of text because that is how much a human can process > over a lifetime. We have much larger LLMs, of course. Their knowledge is > equivalent to thousands or millions of humans, which makes them much more > useful. > > I believe that the Hutter prize, and the Large Text benchmark on which it is > based, helped establish text prediction using neural networks and massive > computing power as the path to AGI. The idea was still controversial when I > started the benchmark in 2006. Most people on this list and elsewhere were > still pursuing symbolic approaches.
Still pursuing symbolic approaches? Notably OpenCog. Yes, that's been a disaster. The whole Link Grammar thing wasted 10 years. Did the Hutter Prize move the field? Well, I was drawn to it as a rare data based benchmark. I just always believed the goal of compression was wrong. I still think that's the case. As a benchmark it will be fine. But the eventual solution will be something which expands new structure. Actually chaotically (with implications for consciousness, and "uploading", actually, but not worth going there for now.) So the benchmark will remain. Just everything we imagine happening under the hood, will change. I recall Hutter's original goal was to find underlying semantic variables. I wonder how he views that now. As far as driving change... In the 00s everyone was focused on statistics. The neural network boom was over. Statistics dominated. Marcus Hutter formulated a statistical definition of intelligence. The Hutter Prize was a statistical goal. That's what I remember. If you think it drove a renewed "neural" revolution, well it did work with data. But I would be surprised if many people back then characterized their entries as "neural". A "neural" model is in many ways the opposite of a statistical model. To be "neural" is to retain data in unsummarized form. What happened was not smaller models, but "deep" models. Not that anyone thought much about it. The renewed "neural" revolution (rechristened as "deep", the taboo on the term "neural network" never really recovered from the stigma of being old tech in the '00s, now it's "deep", or "transformer", or just plain LARGE...) is paradoxical, because they retain the statistical idea they are summarizing, while what actually works repeatedly seems to be the opposite of summarizing. What is "deep"? It is more structure. Things constantly manifest as getting better when those seeking to summarize, summarize less. It's always the pattern. Things get better when we allow more structure. So was it the goal of summarizing better as per the Hutter Prize which drove the field? I would say what drove the field was (accidentally) embracing more structure again. Getting bigger. LARGE. That's also how I would characterize "attention". Enumerating more structure. So first more structure with "deep", and then more structure with "attention". But maybe those submitting entries to the Hutter Prize have seamlessly transitioned away from thinking of themselves as seeking small models, and now somehow keep in mind the idea of "deep" or "large", and distributed, while somehow also imagining they are retaining the original goal of small. Possibly they don't think much about it at all. In many ways language models are a theory vacuum. It's all tools. There's very little imaginative conception of why things should be the way they are, or why one tool works better than another. Constructively, the question which interests me is whether anyone sees any evidence of a ceiling in the number of trained "parameters" (distinct and meaningful patterns) which can be squeezed out of a set of data in a LLM. Chinchilla fascinated me, because it seemed the improvement with more training was in exact proportion to the improvement with simply adding more data, more structure. It seemed like more data and more training were in many ways acting like the same thing. I'm looking for evidence there is any ceiling to that. So, not just the entropy benchmark of the Hutter Prize as such, but evidence what is happening under the hood in the entrants which work best. Evidence that what works under the hood more resembles an endless expansion of more structure, not less. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T42db51de471cbcb9-M52c08b01e361f07555f5f1ee Delivery options: https://agi.topicbox.com/groups/agi/subscription
