Re: [Extropolis] AGI has been achieved!

PGC Thu, 26 Dec 2024 20:35:43 -0800


On Wednesday, December 25, 2024 at 1:46:51 PM UTC+8 Brent Meeker wrote:

I think that LLM thinking is not like reasoning. It reminds me of my
grandfather who was a cattleman in Texas. I used to go to auction with
him where he would buy calves to raise and where he would auction off
one's he had raised. He could do calculations of what to pay, gain and
loss, expected prices, cost of feed all in his head almost instantly.
But he couldn't explain how he did it. He could do it pencil and paper
and explain that; but not how he did it in his head. So although the
arithmetic would be of the same kind he couldn't figure insurance rates
and payouts, or medical expenses, or home construction costs in his
head. The difference with LLM's is they have absorbed so many examples
on every subject, as my grandfather had of auctioning cattle, that the
LLM's don't have reasoning, they have finely developed intuition, and
they have it about every subject. Humans don't have the capacity to
develop that level of intuition about more that one or two subjects;
beyond that they have to rely on slow, formal reasoning.

Nail on the head. That’s an interesting story and got me thinking (too much
again). This reminds me of the distinction in Kahneman’s bestseller from
2011 between “type 1” and “type 2” thinking (aka “system 1” and “system
2”). And although there are problems/controversies with his ideas/evidence,
as is almost standard with psychological proposals of this kind, we can, if
you indulge me, take his ideas less literally describing loose cognitive
styles of reasoning as follows:

LLMs today (end of 2024), with their vast memory-like parameter sets, are
closer to system 1 “intuitive reasoning styles” that appear startlingly
broad and context-sensitive—even "fine", as you say, somewhat related to
your grandfather's style of reasoning in their area(s) of expertise. Yet,
in my view, LLMs (not wishing to imply anything about your grandfather) are
not performing the slower, more deliberative, rule-based thinking we often
associate with system 2.

Take the example of Magnus Carlsen, arguably one of the best performing
chess players in recent history, relying increasingly on instinctive, “type
1” moves, after thousands of hours of deliberate “type 2” calculation (when
he was young and ascending the throne, he was more known for his
calculation skills). This illustrates how extensive practice can shift
certain skills from a laborious, step-by-step process toward fast,
experience-based pattern recognition. I’ve observed many times how he looks
at the clock in a time crunch situation and makes decisive, critical,
superior computeresque moves, based purely on a more precise and finely
tuned gut feeling, than his opponent. Especially with too little time to
calculate his way through a situation in recent years. In streams where he
comments on other GM’s play, he often instantly blurts out statements like
“that knight just belongs on e6” or “the position is screaming for bishop
f8” and similar, without a second of thought or looking at the AI/engine
evaluation.

A similar thing happens, perhaps not at a world class level (but even
Magnus makes mistakes, just less often than others), when a person learns
to drive: at first, every procedure (shifting gears, 360 degree checks,
mirror vigilance, steering) is deliberate and conscious, whereas a
practiced driver performs multiple operations fluently/simultaneously
without thinking in type 2 manner. We see that in most advanced tasks—like
playing chess, doing math, or solving puzzles—humans merge both system 1
and system 2, but they may rely more on one depending on experience and
context.

I see current LLMs as occupying a distinct spot: the “intuition” they rely
on is not gained through slow, personal experience in one domain, but
rather from the vast breadth of their pretraining across nearly every
field for which text data or code exists. Again, it’s not just static
memory and I do recognize the nuance that they are *not just memorizing
answers to questions or merely content.* AGI advocates scream: “LLMs
interpolate, so its reasoning!” Unfortunately, they don’t often specify
what that means.

What they primarily memorize, if I understand correctly is
functions/programs in a certain way. Programs do generalize to some extent
by mathematical definition. When John questions an LLM via his keyboard, he
is *essentially querying a point in program space, *where we can think of
the LLM in some domain as a manifold with each point encoding a
function/program. Yes, they do interpolate across these manifolds to
combine/compose programs which implies an infinite number of possible
programs that John can choose from through his keyboard. That is why they*
appear* to reason richly and organically, unlike earlier years and can help
debug code (as they have been trained against compositions that yield false
results) in sophisticated manners *with human assistance.*

So what we’re doing with LLMs is that we’re training them as rich flexible
models to predict the next token. And if we had infinite memory capacity,
we could simply train them to learn a sort of lookup table. But the reality
is more modest: LLMs only have some billion or trillion parameters. That’s
why they screw up basic grade school math problems not in their training
set or do not manage to remove a single small element John doesn’t want in
the complex image he’s generated with his favorite image generator. *Therefore
it cannot learn a lookup table for every possible input sequence relating
to its training data. It is forced to compress. So what these programs
“learn” is predictive functions that take and detect the form of vector
functions because the LLM is a curve… and the only thing we can encode with
a curve is a set of vector functions. These take elements of the entry
sequence as inputs and add output elements of what follows. *

Say John feeds it the works of Oscar Wilde for the first time and the LLM
has already “learned” a model of the English language. The text that John
has inputed is slightly different and yet still the English language. And
that’s why it’s so good at emulating linguistic styles and elegance or lack
thereof, as its possible to model Oscar Wilde by reusing a lot of functions
of learning to model English in general. *It therefore becomes trivial to
model Wilde’s style by simply deriving a style transfer function that will
go from the model of English to the Wilde style texts*. That’s how/why
people are amazed at its linguistic dexterity in this sense.

That’s why they appear to have an “intuitive” command of everything—law,
biology, programming, philosophy—despite lacking the capacity to handle
tasks that need deeper, stepwise reasoning or dynamic re-planning in new
contexts. This resonates with your grandfather’s “cattle auction” case: he
could handle his specialized domain through near-instant intuition but
needed pencil-and-paper for general arithmetic outside of it, if I
understand correctly. LLMs similarly, may seem “fluent” in many subjects,
yet they cannot truly engage in what we would label “system 2 program
synthesis”, unless of course the training distribution already covers a
close analogue.

In short, the reason LLM outputs feel like “type 1 reasoning on steroids”
is that these models have memorized so many examples that their combined
intuition extends across nearly all known textual domains. But when a
problem truly demands formal reasoning steps absent from their training
data, LLMs lack a real “type 2” counterpart—no robust self-critique, no
internal program writing, and no persistent memory to refine their logic.
We can therefore liken them to formidable intuition machines without the
same embedded capacity for system 2, top-down reasoning or architectural
self-modification that we see in real human skill acquisition. My big
mistake is of course that John would never read Oscar Wilde.

As a short note to Russell: yes, they appear to be competent at such
assistance for similar reasons. I don't see advancements in the coding use
case as a function purely of scaling and compute. The more advanced models
can assist in such ways because they have longer histories of having
suppressed "wrong" function chains/interpolation. Thx for the inspo guys.

--
You received this message because you are subscribed to the Google Groups
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/everything-list/ad612720-63e4-4837-8a48-13540c32b337n%40googlegroups.com.

Re: [Extropolis] AGI has been achieved!

Reply via email to