Re: GPT-4 solving hard riddles

Telmo Menezes Tue, 21 Mar 2023 02:39:22 -0700

> 
>> 
>>> Over-fitting is less of an issue here because it's trivial to write a 
>>> sentence that's never before been written by any human in history.
>> 
>> That is not enough. A small variation on a standard IQ test is still the 
>> same IQ test for a super powerful pattern detector such as GPT-4.
>> 
>> I have no doubt that GPT-4 can generalize in its domain. It was rigorously 
>> designed and tested for that by people who know what they are doing. My 
>> doubt is that you can give it an IQ test and claim OMG GPT-4 IQ > 140. This 
>> is just silly and it is junk science.
> 
> It's true that once one learns a way to solve problems it becomes easier to 
> reapply that method when you next encounter a related problem.
> 
> But isn't that partly what intelligence is? If a system has read the whole 
> Internet and seen every type of problem we know how to solve, and it can 
> generalize to know what method to use in any situation, that's an incredible 
> level of intelligence which until now, we haven't had in machine form before.


I would say that the important methodological distinction here is between 
learning intelligent behavior and demonstrating intelligent behavior. Obviously 
it is possible to learn and generalize from a dataset, otherwise there would be 
no point in wasting time with ML. But if you want to convince other people that 
you have indeed achieved generalization, then the scientific gold standard is 
to demonstrate this on data that was not used in training, because beyond 
generalization there can be also (and often is) overfitting. This is not a 
controversial statement. Take any published ML result and apply it to the 
training data, and 99.9999999% of the time it will perform better / much better 
in the training data. Because it also learned the little details (over-fitting) 
that guide it towards the correct answer.

An extreme case of this is stock trading. I am not kidding, and I suspect you 
know it: I can easily produce an ML model that achieves >1000% profit per month 
on the derivatives market, as long as we only test on in-corpus data. But I 
will raise the stakes! Are you ready?

I promise I will train my algorithm only on ONE crypto coin from 2020 to 2022. 
Then we will apply it to OTHER crypto coins. I still promise >1000% profit per 
month. Do you want it now?

I understand that GPT-4 is trained on most available text in natural language. 
That is amazing, I love it. But this comes with additional methodological 
challenges. I am pretty sure that the GPT-4 teams knows about them, and they 
probably have a rigorously reserved training set to guide their own research. 
Also, I fully believe that they are serious researchers and would never embark 
in this IQ test bullshit.

I am really just insisting on sticking to the scientific attitude. I do not 
understand what I could saying that is so controversial...

> 
>>> You can tweak the parameters of the problem to guarantee it's a problem it 
>>> has never before been seen, and it can still solve it.
>> 
>> Some yes, some no. Almost one century of computer science still applies.
>> 
>>> You can choose to wait for the academic write ups to come out a few months 
>>> down the line but by then things will have advanced another few levels from 
>>> where we are today.
>> 
>> I am not wanting to wait for anything, I am asking questions that can be 
>> addressed right now:
>> 
>> - Are there IQ tests in the training data of GPT-4. Yes or no?
>> - Can we conceive of human-level intelligence without recurrent connections 
>> or some form of ongoing recursivity / Turing completeness? Yes or no?
> 
> 
> I've been thinking about this a lot.
> 
> 
> My friend with access to GPT-4 asked it: "Does your neural network contain 
> such reflexive loops, or is it strictly feed forward?", below is its reply:
> 
> --------
> 
> "As an AI language model, my underlying architecture is based on the 
> Transformer model, which is primarily feedforward in nature but includes some 
> elements of recurrence or reflexivity, specifically in the self-attention 
> mechanism.
> 
> The Transformer model consists of a multi-layer feedforward neural network 
> with self-attention mechanisms that allow the model to learn complex 
> relationships between input and output sequences. The self-attention 
> mechanism weighs the importance of different input elements relative to each 
> other, effectively capturing long-range dependencies and relationships within 
> the input data. This mechanism introduces a form of reflexivity or 
> recurrence, as it allows the model to consider the context of the entire 
> input sequence when generating outputs.
> 
> However, it is important to note that the self-attention mechanism in the 
> Transformer model is not equivalent to traditional recurrent neural networks 
> (RNNs), which maintain a hidden state that is updated at each time step. The 
> Transformer model processes input sequences in parallel rather than 
> sequentially, which makes it fundamentally different from RNNs.
> 
> In summary, while my neural network architecture is primarily feedforward, it 
> includes some elements of reflexivity in the form of self-attention 
> mechanisms that allow the model to capture complex relationships within input 
> sequences."
> 
> --------
> 
> Is this enough to meet Hofstadter's requirements of recursion? I do not have 
> the expertise to say. But I do see recursion exist in a way no one seems to 
> ever mention:
> 
> The output of the LLM is fed back in, as input to the LLM that produced it. 
> So all the high level processing and operation of the network at the highest 
> level, used to produce a few characters of output, then reaches back down to 
> the lowest level to effect the lowest level of the input layers of the 
> network.
> 
> If you asked the network, where did that input that it sees come from, it 
> would have no other choice but to refer back to itself, as "I". "I generated 
> that text."
> 
> Loops are needed to maintain and modify a persistent state or memory, to 
> create a strange loop of self-reference, and to achieve Turing completeness. 
> But a loop may not exist entirely in the "brain" of an entity, it might 
> offload part of the loop into the environment in which it is operating. I 
> think that is the case for things like thermostats, guided missiles, AlphaGo, 
> and perhaps even ourselves.
> 
> We observe our own actions, they become part of our sensory awareness and 
> input. We cannot say exactly where they came from or how they were done, 
> aside from modeling an "I" who seems to intercede in physics itself, but this 
> is a consequence of being a strange loop. In a sense, our actions do come in 
> from "on high", a higher level of abstraction in the hierarchy of processing, 
> and this seems as if it is a dualistic interaction by a soul in heaven as 
> Descartes described.
> 
> In the case of GPT-4, its own output buffer can act as a scratch pad memory 
> buffer, to which it continuously appends it's thoughts to. Is this not a form 
> of memory and recursion?
> 
> For one of the problems in John's video, it looked like it solved the Chinese 
> remainder theorem in a series of discrete steps. Each step is written to and 
> saved in it's output buffer, which becomes readable as it's input buffer.
> 
> Given this, I am not sure we can say that GPT-4, in its current architecture 
> and implementation, is entirely devoid of a memory, or a loop/recursion.
> 
> I am anxious to hear your opinion though.

This is a great answer by GPT-4 and a good point. I agree that the ability to 
re-feed the output buffer back to the language model constitutes a form of 
computational recurrence and his indeed a memory mechanism. One could even 
imagine more sophisticated "tricks", where one explains GPT-4 how to read/write 
from some form of database.

I can imagine several ways forward here:

(1) The amount of input/context that LLMs can receive keeps increasing, and 
eventually it is so large that RLHF can teach LLMs to make use of an 
input/output buffer as a working memory;

(2) Some neuro-symbolic scheme is devised such that the LLM can use APIs to 
extend itself;

(3) True recurrence inside the model is achieved (this requires some new 
learning algorithm that does not suffer from vanishing gradient).

I think that (3) is by far the scientifically most exciting, but it is one of 
those things where it seems hard to estimate when the breakthrough will come. 
Maybe tomorrow, maybe in three decades... So another question is, can we ride 
(1) or (2) all the way to AGI? I don't know...

I suspect that truly integrating all the modalities in a human-being kind of 
way (language, vision, memory formation and access, meta-cognition, etc) will 
require (3). But I do not have a strong argument. I love coding, so in that 
sense (2) is a bit more exciting :)

For me only two things are clear at this point:

- GPT-* is a spectacular, qualitative jump in AI. It can do things that we 
couldn't dream of a couple of years ago. It will almost certainly be a piece of 
the puzzle towards AGI.

- There is still a huge chasm between Human Intelligence (HI) and GPT-4. How 
long will it take to cross that chasm? Who knows...

One thing I wonder is if the main difference between HI and LLMs lies in the 
utility function more than everything else. We humans have this highly evolved, 
emergent utility function that allows us to be guided by feelings (boredom, 
curiosity, lust, fear, etc) into highly complex behaviors and meta-behaviors. 
We decide to learn things in a certain way for a complicated set of reasons 
towards a long term goal. In classical AI parlance, we are autonomous agents.

One final point about recursion: where I was trying to get at with the chess 
example is that HI can solve problems that are provably more time complex than 
constant / linear. We can solve polynomial type stuff, and even approximate 
solutions for NP-hard stuff. Playing a game like chess requires expensive 
navigation of a very large tree of possible states. This is true both for 
computers and humans, although they might implement this capability in 
different ways. Grand masters sometimes commit blunders when trying to explore 
the tree further than their cognitive capabilities permit, and they will 
discuss such things (meta-cognition).

GPT-4 as a pure computational environment lacks the ability to perform 
polynomial time computations. It "fools" us spectacularly by wielding its 
immense domain knowledge of... everything. But this only goes so far. It can 
never defeat a competent chess player with such an architecture. Of course, we 
can integrate GPT-4 with some API and let it call some explore_deep_tree() 
function, but this is not the sort of deep integration that one imagines in 
sophisticated AI. True recurrence would allow for true computational power 
within the model.

This is the sort of things I have been thinking. I may be missing something 
obvious. Would also love to read your opinion!

Telmo

> Jason 
> 
> 
> 
> --
> You received this message because you are subscribed to the Google Groups 
> "Everything List" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to everything-list+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/everything-list/CA%2BBCJUjFhjj5bzZx6x4iq_NjXOy%2BAmadTnvzF464J87xvBc_Ag%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/everything-list/CA%2BBCJUjFhjj5bzZx6x4iq_NjXOy%2BAmadTnvzF464J87xvBc_Ag%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to everything-list+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/everything-list/7738df02-557d-4bfd-aee5-b60d07a2dfb5%40app.fastmail.com.

Re: GPT-4 solving hard riddles

Reply via email to