Re: [agi] Breaking AIXI-tl

Eliezer S. Yudkowsky Fri, 14 Feb 2003 08:10:04 -0800

Ben Goertzel wrote:

OK.  Rather than responding point by point, I'll try to say something
compact ;)


You're looking at the interesting scenario of a iterated prisoners dilemma
between two AIXI-tl's, each of which has a blank operating program at the
start of the iterated prisoners' dilemma.  (In parts of my last reply, I was
questioning the blankness of the operating program, but let's accept it for
sake of discussion.)

The theorems about AIXI-tl do not say much about the performance of AIXI-tl
relative to other systems on this task.  Because what the theorems talk
about is

AIXI-tl maximizing reward function R
versus
System X maximizing reward function R

over a long period of time.  Whereas in your case you're asking about

AIXI-tl maximizing reward function R(AIXI_tl)
versus
System X maximizing reward function R(X)

i.e. the reward function is a function of the system in question.  AIXI-tl
and System X (e.g. an uploaded human) are not competing against the same
opponent, they're competing against different opponents (their clones, in
your scenario).

So, unless I'm overlooking something, you're looking at a scenario not
covered by Hutter's theorems.

That is correct. As I said:

"An intuitively fair, physically realizable challenge, with important real-world analogues, formalizable as a computation which can be fed either a tl-bounded uploaded human or an AIXI-tl, for which the human enjoys greater success measured strictly by total reward over time, due to the superior strategy employed by that human as the result of rational reasoning of a type not accessible to AIXI-tl."

Obviously, such a challenge cannot be covered by Hutter's theorems or AIXI-tl would outperform the human. The question is whether Hutter's theorems describe all the realistic physical situations a mind can encounter.

You're stating that a human (System X) can do better in an iterated PD
against other humans, than an AIXItl can do in an iterated PD against other
AIXItl's.

That is correct. Humans (and Friendly AIs) can employ Hofstadterian superrationality as a strategy; AIXI-tl cannot.

I still have problems understanding your reasoning, when you derive this
conclusion.  Maybe I'm just being obtuse; I'm sure I haven't spent as much
time thinking about it as you have.

But, suppose you're right.  What you've done is come up with an interesting
observation (and if you formalize it, an interesting theorem) about (small)
social systems of AIXI-tl's.  This is very nice.

Does this somehow tell you something about the interactions of AIXI-tl's
with humans?  Is that the follow-up point you want to make, regarding AIXItl
Friendliness?

Nope. The two points are, if not completely unrelated, then related only on such a deep level that I wasn't explicitly planning to point it out.

But Hofstadterian superrationality - and certain other generalized challenges - are physically realizable, important, and can be solved by humans because we have superior reflectivity to AIXI-tl.

Your observation is about the behavior of an AIXI/AIXI-tl whose only
life-experience has consisted of a very weird artificial situation.  This
behavior is not going to be the same as the behavior of an AIXI/AIXItl
embedded in a richer environment with a different reward function.  That is
the point I was trying to make with my talk about the "initial operating
program" of the AIXI/AIXItl in your simulation.

Yes. It is a quite irrelevant point to breaking Hutter's theorem. Also, specifying an AIXI-tl embedded in some unspecified prior environment injects entropy into the problem description. I show that AIXI-tl does learn to recognize its own reflected "way of thinking" (as opposed to its reflected actions, which AIXI-tl *can* recognize) because AIXI-tl cannot, as a human would, remember its own way of thinking or deliberately "place itself into the other human's shoes" and simulate its own way of thinking given different goals, both abilities available at the human level of reflectivity; AIXI-tl can only place itself into the shoes of tl-bounded processes. This prohibits AIXI-tl from using Hofstadterian superrationality to notice that the policies of other entities correlate with its own policies, and prohibits AIXI-tl from choosing the policies of other entities by selecting its own policies based on the knowledge that the policy the other entity chooses will correlate with its own. There are additional Other Mind correlation problems that humans can't solve but seed AIs can because of the seed AI's superior reflectivity; the point is that there's a real kind of intelligence here of which AIXI-tl arguably has quantity zero.

Now, let me get back to my problems understanding your reasoning.  Consider
the problem

PD(Y) =
"System Y plays iterated PD against a clone of System Y"

Clearly, PD(Y) is not a problem at which one would expect more intelligent
systems to necessarily perform better than less intelligent ones!!

True, but given roughly selfish (in the sense of uncorrelated goal patterns) and roughly rational systems of roughly equal computational muscles, performance will be dependent on a certain kind of reflectivity which I am arguing is a form of genuine and rational intelligence.

Now consider two subproblems

PD1(Y) =
PD(Y), but each System Y knows it's playing a clone of itself

PD2(Y) =
PD(Y), but each System Y is playing a completely unidentified, mysterious
opponent

I'm worried that in your comparison, you have the human upload playing PD1,
but have the AIXI-tl playing PD2.

"Knows it's playing a clone of itself" is, in this context, an anthropomorphism. AIXI-tl simulates all possible tl-bounded humans who know they are playing clones of themselves, which is supposed to place it on equal grounds with any possible experienced mind, given enough time. What I show is that AIXI-tl can simulate abilities that fail to be duplicated by its control function; it can simulate minds that access their own strategies but can never access the true strategy of AIXI-tl, which, facing itself in a cooperative problem, is what it needs to do.

Incidentally, AIXI-tl does learn to handle the Prisoner's Dilemna because the results it sees can be (incorrectly) understood as a tl-bounded computational transform of its own actions. But for more complex cooperative problems with asymmetrical actions, the inputs AIXI-tl sees will not be tl-bounded computational transforms of its own outputs. (Or, if they are, for any round beyond the first round, this is nonobvious.) Humans facing this issue have the ability to turn roughly human computational musclepower toward putting themselves in the other party's shoes, thus recognizing their own "kind of thinking" reflected back at them. AIXI-tl can only put itself in the shoes of tl-bounded processes, which AIXI-tl itself is not. Human reflectivity is not unlimited but it is higher than AIXI-tl's, and this is reflected in superior human performance on an important class of physically realizable challenges.

"Important", because I strongly suspect Hofstadterian superrationality is a *lot* more ubiquitous among transhumans than among us...

If a human is playing PD2, then it has to proceed solely by time series
analysis, and its actions are probably going to meander around chaotically
until settling on some attractor (or maybe they'll just meander around...).
MAYBE the human manages to recognize that the responses of its opponent are
so similar to its own responses that its opponent must be a lot like
itself... and this helps it settle on a beneficial attractor.

If an AIXItl is playing PD2, the situation is pretty much the same as if the
human is doing so, isn't it?  Except can't you argue that an AIXItl is so
smart that in the long run it's more likely than a human to figure out that
its opponent is acting a lot like it is, and make a guess that symmetrical
friendly behavior might be a good thing?

Even if a (grown) human is playing PD2, it outperforms AIXI-tl playing PD2. For one thing I suspect that humans, given their emotional biases, will try cooperative strategies very quickly and recognize them when seen. This does not show, however, that a human can actually realize they are playing their own clone. What I suspect will happen is that the human will look at the other player's cooperative strategy and spot particular choices where he says, "Hey, this guy is thinking *exactly* like me." Humans can make this realization because their memories record many important interim internal results along with final actions, and because humans find it very easy to place themselves in the shoes of other humans with similar expertise and intelligence but different local subgoals. Both the t2^l interim cognitive results and the t2^l strategic ability of AIXI-tl are closed to the tl-bounded processes it uses to predict its environment. That's the proximal cause of failure.

Now you may need to be a transhumanist or Greg Egan fan to make the further realization that you are up against your own *exact* clone, but even in the absence of this background, I think most humans would still try working out strategies for both sides, with themselves, in step 2, and then implementing them.

Humans can recognize a much stronger degree of similarity in human Other Minds than AIXI-tl's internal processes are capable of recognizing in any other AIXI-tl. In any environment where you are likely to run into equals, or in a physical challenge which forces you to confront an equal, humans outperform AIXI-tl.

It is harder to see that this actually represents a qualitative difference in reflectivity and not just a quantitative difference in reflectivity. But as far as I can tell, for two superintelligences with noncorrelated goals to cooperate *at all* in oneshot-PD situations they *must* use Hofstadterian superrationality. Again, as far as I can tell, this necessarily requires abstracting over your own internal state and recognizing that the outcome of your own (internal) choices are necessarily reproduced by a similar computation elsewhere. Basically, it requires abstracting over your own halting problem to realize that the final result of your choice is correlated with that of the process simulated, even though you can't fully simulate the causal process producing the correlation in advance. (This doesn't *solve* your own halting problem, but at least it enables you to *understand* the situation you've been put into.) Except that instead of abstracting over your own halting problem, you're abstracting over the process of trying to simulate another mind trying to simulate you trying to simulate it, where the other mind is sufficiently similar to your own. This is a kind of reasoning qualitatively closed to AIXI-tl; its control process goes on abortively trying to simulate the chain of simulations forever, stopping and discarding that prediction as unuseful as soon as it exceeds the t-bound. AIXI-tl can simulate tl-bounded minds that have the ability to abstract over their *own* halting problem and employ Hofstadterian superrationality, but that doesn't help AIXI-tl abstract over a t2^l-sized AIXI-tl trying to simulate it trying to simulate it, etc., any more than AIXI-tl can simulate a seed AI and thereby gain the ability to modify its own top-level control process to devote t2^l computations to modeling its opposite self.

Anyway... basically, if you're in a real-world situation where the other intelligence has *any* information about your internal state, not just from direct examination, but from reasoning about your origins, then that also breaks the formalism and now a tl-bounded seed AI can outperform AIXI-tl on the ordinary (non-quined) problem of cooperation with a superintelligence. The environment can't ever *really* be constant and completely separated as Hutter requires. A physical environment that gives rise to an AIXI-tl is different from the environment that gives rise to a tl-bounded seed AI, and the different material implementations of these entities (Lord knows how you'd implement the AIXI-tl) will have different side effects, and so on. All real world problems break the Cartesian assumption. The questions "But are there any kinds of problems for which that makes a real difference?" and "Does any conceivable kind of mind do any better?" can both be answered affirmatively.

--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Re: [agi] Breaking AIXI-tl

Reply via email to