On Wednesday, December 25, 2024 at 1:46:51 PM UTC+8 Brent Meeker wrote:
I think that LLM thinking is not like reasoning. It reminds me of my grandfather who was a cattleman in Texas. I used to go to auction with him where he would buy calves to raise and where he would auction off one's he had raised. He could do calculations of what to pay, gain and loss, expected prices, cost of feed all in his head almost instantly. But he couldn't explain how he did it. He could do it pencil and paper and explain that; but not how he did it in his head. So although the arithmetic would be of the same kind he couldn't figure insurance rates and payouts, or medical expenses, or home construction costs in his head. The difference with LLM's is they have absorbed so many examples on every subject, as my grandfather had of auctioning cattle, that the LLM's don't have reasoning, they have finely developed intuition, and they have it about every subject. Humans don't have the capacity to develop that level of intuition about more that one or two subjects; beyond that they have to rely on slow, formal reasoning. Nail on the head. That’s an interesting story and got me thinking (too much again). This reminds me of the distinction in Kahneman’s bestseller from 2011 between “type 1” and “type 2” thinking (aka “system 1” and “system 2”). And although there are problems/controversies with his ideas/evidence, as is almost standard with psychological proposals of this kind, we can, if you indulge me, take his ideas less literally describing loose cognitive styles of reasoning as follows: LLMs today (end of 2024), with their vast memory-like parameter sets, are closer to system 1 “intuitive reasoning styles” that appear startlingly broad and context-sensitive—even "fine", as you say, somewhat related to your grandfather's style of reasoning in their area(s) of expertise. Yet, in my view, LLMs (not wishing to imply anything about your grandfather) are not performing the slower, more deliberative, rule-based thinking we often associate with system 2. Take the example of Magnus Carlsen, arguably one of the best performing chess players in recent history, relying increasingly on instinctive, “type 1” moves, after thousands of hours of deliberate “type 2” calculation (when he was young and ascending the throne, he was more known for his calculation skills). This illustrates how extensive practice can shift certain skills from a laborious, step-by-step process toward fast, experience-based pattern recognition. I’ve observed many times how he looks at the clock in a time crunch situation and makes decisive, critical, superior computeresque moves, based purely on a more precise and finely tuned gut feeling, than his opponent. Especially with too little time to calculate his way through a situation in recent years. In streams where he comments on other GM’s play, he often instantly blurts out statements like “that knight just belongs on e6” or “the position is screaming for bishop f8” and similar, without a second of thought or looking at the AI/engine evaluation. A similar thing happens, perhaps not at a world class level (but even Magnus makes mistakes, just less often than others), when a person learns to drive: at first, every procedure (shifting gears, 360 degree checks, mirror vigilance, steering) is deliberate and conscious, whereas a practiced driver performs multiple operations fluently/simultaneously without thinking in type 2 manner. We see that in most advanced tasks—like playing chess, doing math, or solving puzzles—humans merge both system 1 and system 2, but they may rely more on one depending on experience and context. I see current LLMs as occupying a distinct spot: the “intuition” they rely on is not gained through slow, personal experience in one domain, but rather from the vast breadth of their pretraining across nearly every field for which text data or code exists. Again, it’s not just static memory and I do recognize the nuance that they are *not just memorizing answers to questions or merely content.* AGI advocates scream: “LLMs interpolate, so its reasoning!” Unfortunately, they don’t often specify what that means. What they primarily memorize, if I understand correctly is functions/programs in a certain way. Programs do generalize to some extent by mathematical definition. When John questions an LLM via his keyboard, he is *essentially querying a point in program space, *where we can think of the LLM in some domain as a manifold with each point encoding a function/program. Yes, they do interpolate across these manifolds to combine/compose programs which implies an infinite number of possible programs that John can choose from through his keyboard. That is why they* appear* to reason richly and organically, unlike earlier years and can help debug code (as they have been trained against compositions that yield false results) in sophisticated manners *with human assistance.* So what we’re doing with LLMs is that we’re training them as rich flexible models to predict the next token. And if we had infinite memory capacity, we could simply train them to learn a sort of lookup table. But the reality is more modest: LLMs only have some billion or trillion parameters. That’s why they screw up basic grade school math problems not in their training set or do not manage to remove a single small element John doesn’t want in the complex image he’s generated with his favorite image generator. *Therefore it cannot learn a lookup table for every possible input sequence relating to its training data. It is forced to compress. So what these programs “learn” is predictive functions that take and detect the form of vector functions because the LLM is a curve… and the only thing we can encode with a curve is a set of vector functions. These take elements of the entry sequence as inputs and add output elements of what follows. * Say John feeds it the works of Oscar Wilde for the first time and the LLM has already “learned” a model of the English language. The text that John has inputed is slightly different and yet still the English language. And that’s why it’s so good at emulating linguistic styles and elegance or lack thereof, as its possible to model Oscar Wilde by reusing a lot of functions of learning to model English in general. *It therefore becomes trivial to model Wilde’s style by simply deriving a style transfer function that will go from the model of English to the Wilde style texts*. That’s how/why people are amazed at its linguistic dexterity in this sense. That’s why they appear to have an “intuitive” command of everything—law, biology, programming, philosophy—despite lacking the capacity to handle tasks that need deeper, stepwise reasoning or dynamic re-planning in new contexts. This resonates with your grandfather’s “cattle auction” case: he could handle his specialized domain through near-instant intuition but needed pencil-and-paper for general arithmetic outside of it, if I understand correctly. LLMs similarly, may seem “fluent” in many subjects, yet they cannot truly engage in what we would label “system 2 program synthesis”, unless of course the training distribution already covers a close analogue. In short, the reason LLM outputs feel like “type 1 reasoning on steroids” is that these models have memorized so many examples that their combined intuition extends across nearly all known textual domains. But when a problem truly demands formal reasoning steps absent from their training data, LLMs lack a real “type 2” counterpart—no robust self-critique, no internal program writing, and no persistent memory to refine their logic. We can therefore liken them to formidable intuition machines without the same embedded capacity for system 2, top-down reasoning or architectural self-modification that we see in real human skill acquisition. My big mistake is of course that John would never read Oscar Wilde. As a short note to Russell: yes, they appear to be competent at such assistance for similar reasons. I don't see advancements in the coding use case as a function purely of scaling and compute. The more advanced models can assist in such ways because they have longer histories of having suppressed "wrong" function chains/interpolation. Thx for the inspo guys. -- You received this message because you are subscribed to the Google Groups "Everything List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/everything-list/ad612720-63e4-4837-8a48-13540c32b337n%40googlegroups.com.

