# RE: [agi] Breaking AIXI-tl

```OK.  Rather than responding point by point, I'll try to say something
compact ;)```
```
You're looking at the interesting scenario of a iterated prisoners dilemma
between two AIXI-tl's, each of which has a blank operating program at the
start of the iterated prisoners' dilemma.  (In parts of my last reply, I was
questioning the blankness of the operating program, but let's accept it for
sake of discussion.)

The theorems about AIXI-tl do not say much about the performance of AIXI-tl
relative to other systems on this task.  Because what the theorems talk

AIXI-tl maximizing reward function R
versus
System X maximizing reward function R

AIXI-tl maximizing reward function R(AIXI_tl)
versus
System X maximizing reward function R(X)

i.e. the reward function is a function of the system in question.  AIXI-tl
and System X (e.g. an uploaded human) are not competing against the same
opponent, they're competing against different opponents (their clones, in

So, unless I'm overlooking something, you're looking at a scenario not
covered by Hutter's theorems.

You're stating that a human (System X) can do better in an iterated PD
against other humans, than an AIXItl can do in an iterated PD against other
AIXItl's.

I still have problems understanding your reasoning, when you derive this
conclusion.  Maybe I'm just being obtuse; I'm sure I haven't spent as much
time thinking about it as you have.

But, suppose you're right.  What you've done is come up with an interesting
observation (and if you formalize it, an interesting theorem) about (small)
social systems of AIXI-tl's.  This is very nice.

Does this somehow tell you something about the interactions of AIXI-tl's
with humans?  Is that the follow-up point you want to make, regarding AIXItl
Friendliness?

life-experience has consisted of a very weird artificial situation.  This
behavior is not going to be the same as the behavior of an AIXI/AIXItl
embedded in a richer environment with a different reward function.  That is
the point I was trying to make with my talk about the "initial operating
program" of the AIXI/AIXItl in your simulation.

Now, let me get back to my problems understanding your reasoning.  Consider
the problem

PD(Y) =
"System Y plays iterated PD against a clone of System Y"

Clearly, PD(Y) is not a problem at which one would expect more intelligent
systems to necessarily perform better than less intelligent ones!!

Now consider two subproblems

PD1(Y) =
PD(Y), but each System Y knows it's playing a clone of itself

PD2(Y) =
PD(Y), but each System Y is playing a completely unidentified, mysterious
opponent

I'm worried that in your comparison, you have the human upload playing PD1,
but have the AIXI-tl playing PD2.

PD1 is easier, but PD1 doesn't seem to be your scenario, because it requires
the AIXItl not to be starting with a blank operating program.

Or do you have them both playing PD2?

If a human is playing PD2, then it has to proceed solely by time series
analysis, and its actions are probably going to meander around chaotically
until settling on some attractor (or maybe they'll just meander around...).
MAYBE the human manages to recognize that the responses of its opponent are
so similar to its own responses that its opponent must be a lot like
itself... and this helps it settle on a beneficial attractor.

If an AIXItl is playing PD2, the situation is pretty much the same as if the
human is doing so, isn't it?  Except can't you argue that an AIXItl is so
smart that in the long run it's more likely than a human to figure out that
its opponent is acting a lot like it is, and make a guess that symmetrical
friendly behavior might be a good thing?

-- Ben

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Eliezer S. Yudkowsky
> Sent: Friday, February 14, 2003 1:45 AM
> To: [EMAIL PROTECTED]
> Subject: Re: [agi] Breaking AIXI-tl
>
>
> Ben Goertzel wrote:
>  >
>  >> Because AIXI-tl is not an entity deliberately allocating computing
>  >> power; its control process is fixed.  AIXI-tl will model a process
>  >> that proves theorems about AIXI-tl only if that process is the best
>  >> predictor of the environmental information seen so far.
>  >
>  > Well... a human's control process is fixed too, in a way.  We cannot
>  > rewire our brains, our biological motivators.  And a human will
>  > accurately model other humans only if its fixed motivators have
>  > (directly or indirectly) led it to do so...
>
> I think you're anthropomorphizing AIXI.  (I think you're
> AIXI's formalism, so only there can I actually show your intuitions to be
> wrong.)  You said that AIXI-tl *could in theory* model something.
>  AIXI-tl
> *would not in fact* model that thing, given its control process.  While
> humans *would in fact* model that thing, given theirs.  I am not arguing
> about fixed versus unfixed control processes but pointing out that the
> specific human control process is superior to AIXI-tl.
>
> (For those of you on the list who are not aware that I am not an AI
> skeptic, this is not a Penrosian argument against the computational
> implementation of intelligence, it's an argument against the AIXI-tl
> Cartesian formalism for intelligence.)
>
> You are "anthropomorphizing" AIXI in the sense that you expect AIXI to do
> what you would do given AIXI's raw capabilities, but it's
> possible to look
> at AIXI's control process and see that it does not, in fact, do that.
>
> This is a critical class of problem for would-be implementors of
> Friendliness.  If all AIs, regardless of their foundations, did sort of
> what humans would do, given that AI's capabilities, the whole world would
> be a *lot* safer.
>
>  > Of course, humans are very different from AIXI-tl, because in humans
>  > there is a gradation from totally hard-wired to totally
>  > ephemeral/flexible, whereas in AIXI-tl there's a rigid dichotomy
>  > between the hard-wired control program and the ephemeral operating
>  > program.
>  >
>  > In this way Novamentes will be more like humans, but with the
>  > flexibility to change their hard-wired motivators as well, if they
>  > REALLY want to...
>
> And what they do with that flexibility will be totally unlike what you
> would do in that situation, unless you understand the sensitive
> dependencies between a mind's foundations and how that mind behaves.
>
> You expect AIXI to behave like Novamente, and you expect both to behave
> like a human mind.  You are mistaken with respect to both AIXI and
> Novamente, but I can only demonstrate it for AIXI.  (Please don't reply
> with a list of differences you perceive between AIXI/Novamente/humans; I
> know you perceive *some* differences.)
>
>  >> Lee Corbin can work out his entire policy in step (2), before step
>  >> (3) occurs, knowing that his synchronized other self - whichever one
>  >> he is - is doing the same.
>  >
>  > OK -- now, if AIXItl were starting out with the right program, it could
>  > do this too, because the program could reason "that other AIXItl is
>  > gonna do the same thing as me, so based on this knowledge, what should
>  > I do...."
>
> It *could* do this but it *doesn't* do this.  Its control process is such
> that it follows an iterative trajectory through chaos which is forbidden
> to arrive at a truthful solution, though it may converge to a stable
> attractor.
>
>  > But you seem to be assuming that
>  >
>  > a) the Lee Corbin starts out with a head full of knowledge achieved
>  > through experience
>
> That is correct.  AIXI-tl is supposed to equal or surpass *any* specific
> tl-bounded program given enough time.  I could give Lee Corbin a computer
> implant.  I could put AIXI-tl up against a tl-bounded superintelligence.
> AIXI-tl is still supposed to win.  You are applying anthropomorphic
> reasoning ("a head full of knowledge achieved through experience") to a
> formally specified problem.
>
>  > b) the AIXItl starts out without a reasonable operating program, and
>  > has to learn everything from scratch during the experiment
>
> That is not formally a problem if the experiment lasts long
> enough.  Also,
> please note that being armed with the capability to simulate 2^l programs
> tl-bounded to THE SIZE OF AN ENTIRE HUMAN MIND is, anthropomorphically
> speaking, one HELL of a capability.  This capability is supposed to equal
> or overwhelm Corbin's regardless of what "knowledge is stuffed
>
>  > What if you used, for the competition a Lee Corbin with a tabula rasa
>  > brain, an infant Lee Corbin.  It wouldn't perform very well, as it
>  > wouldn't even understand the competition.
>
> Again: anthropomorphic reasoning about a formally specifiable problem.
> Lee Corbin is tl-bounded therefore the contest is fair.  If the contest
> goes on long enough AIXI-tl should win or, at worst, lose by a
> bounded amount.
>
>  > Of course, if you put a knowledgeable human up against a new baby
>  > AIXI-tl, the knowledgeable human can win an intelligence contest.  You
>  > don't need the Prisoner's Dilemma to prove this.  Just ask them both
>  > what 2+2 equals. The baby AIXI-tl will have no way to know.
>
> That is why the challenge is not one-shot but iterated, to give AIXI-tl
> time to learn, which it should do *very* fast given its *enormous*
> computing capabilities.
>
>  > Now, if you give the AIXI-tl enough time and experience to learn about
>  > Prisoners Dilemma situations -- or, to learn about selves and minds and
>  >  computer systems -- then it will evolve an operating program that
>  > knows how to reason somewhat like a human does, with concepts like
>  > "that other AIXI-tl is just like me, so it will think and act like I
>  > do."
>
> AIXI-tl does not, in fact, develop such an operating program.  It
> would be
> nice if AIXI-tl did so, but in fact, it doesn't.  That's why it's
> important, for the purposes of this discussion, that AIXI is a *formally*
> specified system.  If you look at an incompletely specified AI system
> there is unlimited room for anthropomorphism simply because the space of
> things the AI "could" do is so enormous.  If what you've specified so far
> *does not in fact* do it, why, it'll undoubtedly be a consequence of the
>
> It is possible that if AIXI-tl met the challenge somehow having already
> developed a probability distribution that *exactly* imitates Lee
> Corbin in
> the sense that Lee Corbin's actions are predicted to be optimal, then the
> resulting actions are self-confirming with respect to the
> Corbin-sustaining predictions.  However, as AIXI-tl is defined, I think
> the result might be that AIXI-tl would try to defect against itself and
> then be caught in a degenerating spiral.  If so it would be a property of
> the control process, not the Corbinlike content.
>
> That, however, is not the point; obviously if AIXI-tl is allowed
> to arrive
> at the problem already containing an optimal policy, it can match any
> tl-bounded program!  The point is that AIXI-tl cannot *learn* to do what
> Corbin does.  You can talk about AIXI-tl having previously encountered
> some unspecified set of environmental conditions that teaches it to do
> exactly what Corbin does; but this is just injecting enough entropy back
> into the problem specification that you can anthropomorphize it again.
> When you look at AIXI-tl encountering the *specific* physical challenge,
> it does *not* learn to behave like Corbin.  And it's supposed to; for any
> challenge that does *not* break the formalism AIXI-tl *will* learn to
> outperform Corbin, even starting from scratch.
>
>  >> The major point is as follows:  AIXI-tl is unable to arrive at a
>  >> valid predictive model of reality because the sequence of inputs it
>  >> sees, on successive rounds, are being produced by AIXI-tl trying to
>  >> model the inputs using tl-bounded programs, while in fact those
>  >> inputs are really the outputs of the non-tl-bounded AIXI-tl.  If a
>  >> tl-bounded program correctly predicts the inputs seen so far, it will
>  >> be using some inaccurate model of the actual reality, since no
>  >> tl-bounded program can model the actual computational process AIXI-tl
>  >> uses to select outputs.
>  >
>  > Yah, but Lee Corbin can't model (in perfect detail) the actual
>  > computational process the other Lee Corbin uses to select outputs,
>  > either.  So what?
>
> Actually, during step (2), Lee Corbin can model in perfect detail the
> other Lee Corbin's selected actions, even though Lee Corbin can't
> model in
> perfect detail the actual brain processes producing those actions.  Lee
> Corbin has perfect access to the result of a process even though he does
> not have perfect access to the process itself.  Moreover Lee Corbin can
> actually *choose* the other Lee Corbin's actions during step (2);
> each Lee
> Corbin has the subjective experience of *choosing* actions for both Lee
> Corbins, about which they will always be in perfect agreement.  In step
> (3) the symmetry is broken but Lee Corbin can still carry out the
> agreement between his selves.  This is correlation between
> *policies*, not
> just actions; it is correlation performed on certain internal
> choices that
> are not part of the final actions taken; AIXI-tl formally does
> not perform
> such correlation as part of how its control process estimates successful
> predictions to arrive at an environmental model.  For that matter, given
> the control process, AIXI-tl would not even attempt to invent a policy
> until step (4), thus breaking the symmetry.  And this is not
> modifiable by
> an internal program that "wants" to invent a policy at step 2; AIXI-tl
> simply doesn't work that way.  It doesn't need any policy at step 2
> because there are no actions to be taken, and even if AIXI-tl invented a
> policy at that point, it would be discarded on the next incremental step
> of AIXI-tl.
>
> Again, you can say that an AIXI-tl might theoretically arrive at the
> cause it to
> implement a cooperative policy which is then self-reinforcing and,
> somehow, stable in terms of being seen as offering maximum reward given
> that model of reality.  You can say that despite any policy being
> formulated at step 2 being discarded afterward, with no opportunity to
> preserve state because there are no actions to be taken, the new policy
> created at step 4 would manage to "pretend" that step 3 hasn't happened
> yet in perfect synchrony with its other self, despite the policy having
> been selected on the basis of data that include step 3.
>
> But again, allowing AIXI-tl to have unspecified past experiences
> that have
> somehow already taught it optimal solutions is injecting entropy into the
> problem specification to allow for anthropomorphism.  If you look at what
> AIXI-tl actually does on this particular challenge, it does not
> *learn* to
> solve it no matter how much time is allowed.  Learning to solve arbitrary
> problems, starting from scratch, given enough time, is what
> AIXI-tl is all
> about.  If AIXI-tl were just something that *could* solve problems given
> the right previous history, it would be no better than a tl-bounded UTM
> (which can solve any tl-bounded problem, providing someone has already
> programmed it in just the right way).
>
>  >> Humans can use a naturalistic representation of a reality in which
>  >> they are embedded, rather than being forced like AIXI-tl to reason
>  >> about a separated environment; consequently humans are capable of
>  >> rationally reasoning about correlations between their internal mental
>  >> processes and other parts of reality, which is the key to the complex
>  >> cooperation problem with your own clone - the realization that you
>  >> can actually *decide* your clone's actions in step (2), if you make
>  >> the right agreements with yourself and keep them.
>  >
>  > I don't see why an AIXI-tl with a clever operating program coming into
>  > the competition couldn't make the same realization that a human does.
>
> One, AIXI-tl is not allowed to enter with a clever operating program.
> That misses the whole point of AIXI-tl.  That's like letting a UTM enter
> with a clever operating program.
>
> Two, how *exactly* would an AIXI process make that realization?  What
> specifically would happen?  I have extrapolated forward the control
> process of AIXI-tl and it appears to follow a certain specific trajectory
> which is not the anthropmorphic trajectory you postulate.
>
>  > So your argument is that a human baby mind exposed ONLY to prisoners'
>  > dilemma interactions as its environment would somehow learn to "realize
>  > it can decide its clone's actions", whereas a baby AIXI-tl mind exposed
>  > only to these interactions cannot carry out this learning?
>
> Again, anthropomorphism; "baby" has no place in a formal problem
> specification.  All AIXI-tls are specified to begin as babies and
> nonetheless equal or outperform any tl-bounded process given enough
> iterations of the problem.  Moreover, an AIXI-tl that simulates 2^l
> processes the size of a human upload is not a "baby".  Finally, I am not
> arguing about design superiority, or arguing against AI, I am breaking a
> formalism.
>
>  >> (b)  This happens because of a hidden assumption built into the
>  >> formalism, wherein AIXI devises a Cartesian model of a separated
>  >> environmental theatre, rather than devising a model of a naturalistic
>  >> reality that includes AIXI.
>  >
>  > It seems to me this has to do with the nature of AIXI-tl's operating
>  > program.
>
> AIXI-tl can devise programs that, as part of how their predictive
> process,
> implicitly model a sort of AIXI-tl, provided that their modeling is
> tl-bounded.  This is a *very* sharp bound given that they're modeling a
> process which is "t2^l times a really huuuuge constant"; it would be very
> hard for the model to successfully guess *any* of AIXI-tl's internals,
> except the record of AIXI-tl's past actions, which are supplied.  Humans
> have much more self-knowledge than that.  Our map of ourselves can never
> actually be as large as ourselves, but at least it includes more than a
> record of our past actions.  AIXI-tl creating a predictive model of
> reality can only attempt to correlate reality against its *actual
> actions*
> - it can't say "Aha, these are the actions I would have taken in that
> situation, I must be facing a clone of myself" because "the actions a
> AIXI-tl would have taken in that situation" are specific data that were
> computed using t2^l operations and hence can't be computed by the
> internal state, only the record of AIXI-tl's past actions.  AIXI-tl's
> internal programs can reason abstractly about "an AIXI-tl of whom I am a
> policy" but they can't ever access the information computed by
> the AIXI-tl
> control program, except for a record of past actions.  They can't figure
> out what AIXI-tl "would have done" in a particular situation, not unless
> the answer happens to be tl-bounded.  They can't make AIXI-tl figure out
> an entire *policy* in synchrony with a distant self in step (2), and have
> no way to preserve that information if they did.  The Primary and
> Secondary will, by their decision processes, compute actions only in step
> 4 and will compute only the specific actions they need.
>
>  > With the right operating program, AIXI-tl would model reality in a way
>  > that included AIXI-tl.  It would do so, only if this operating program
>  > were useful to it....
>
> Again, AIXI-tl cannot do this in the way that humans do, because while
> humans cannot access or choose all of our internal states, we can access
> and choose a lot more of ourselves than AIXI's tl-bounded programs are
> allowed to access or choose.
>
>  > For example, if you wrapped up AIXI-tl in a body with skin and
>  > actuators and sensors, it would find that modeling the world as
>  > containing AIXI-tl was a very useful strategy.  Just as baby humans
>  > find that modeling the world as containing baby humans is a very useful
>  > strategy...
>
> Policies can recognize their own future actions, but can't recognize "the
> action AIXI-tl would have taken in this situation in the past", or make
> predictions using AIXI-tl's past internal mindstate as data, which is why
> such policies can't make the successful predictions that would promote
> them in AIXI's eyes.  Furthermore, there's only a very brief period in
> which AIXI can stumble across a good policy even in theory; once the
> Primary has a set of input information corresponding to the Secondary's
> past guesses, the environment has turned fundamentally chaotic from
> AIXI-tl's perspective (beyond the ability of tl-bounded processes to
> predict very well).  AIXI-tl can model AIXI-tl's chaos better
> than, say, a
> tl-bounded Corbin... but Corbin facing the Clone challenge doesn't
> *create* chaos that requires t2^l instructions to compute.
>
>  >> (c)  There's no obvious way to repair the formalism.  It's been
>  >> diagonalized, and diagonalization is usually fatal.  The AIXI
>  >> homunculus relies on perfectly modeling the environment shown on its
>  >> Cartesian theatre; a naturalistic model includes the agent itself
>  >> embedded in reality, but the reflective part of the model is
>  >> necessarily imperfect (halting problem).
>  >
>  > But the reflective part of the human mind is ALSO necessarily
>  > imperfect... I don't see how you've shown AIXI-tl to have a deficiency
>  > not also shared by the human mind's learning algorithms...
>
> It's a lot *less* imperfect.  The human mind doesn't model its own entire
> self, but it has direct access to certain parts of itself; AIXI-tl's
> policies can't access that data directly, and the data is too expensive
> for tl-bounded programs to compute indirectly.
>
>  >> (d)  It seems very likely (though I have not actually proven it) that
>  >> in addition to breaking the formalism, the physical challenge
>  >> actually breaks AIXI-tl in the sense that a tl-bounded human
>  >> outperforms it on complex cooperation problems.
>  >
>  > I am very unconvinced of this.
>
> Duly noted.
>
>  >> (e)  This conjectured outperformance reflects the human use of a type
>  >> of rational (Bayesian) reasoning apparently closed to AIXI, in that
>  >> humans can reason about correlations between their internal processes
>  >>  and distant elements of reality, as a consequence of (b) above.
>  >
>  > It seems to me that AIXI-tl can reason about correlations between its
>  > internal processes and other elements of reality -- especially if it is
>  > given a "codic modality", i.e. the ability to sense its own internal
>  > processes and reason about them.
>
> It's a formal system, Ben, you can't "give it a codic modality".  The
> programs AIXI-tl selects from have access to the record of AIXI-tl's past
> actions, nothing more.
>
> This is like taking the Principia Mathematica after Godel diagonalized it
> and saying that a patched PM could probably perceive the truth of Godel's
> (original) statement if the patched PM had a goal system and the ability
> to rewrite its own source code.
>
> (Also, you're talking about a reflective modality, not a codic modality.
> You can have a "reflective codic modality", but what AIXI specifically
> doesn't have, in this case, is a "reflective modality"; it
> hell of a codic modality.)
>
> In this vein, note that you can create generalized Clone challenges that
> break humans because of their inadequate reflective modalities, which a
> Friendly AI can handle successfully.  This reflects the foundational
> difference of a Friendly AI having access to its own source code.  I
> conjecture that *no possible* mind is reflective enough to perform
> optimally on all possible generalized Clone challenges.  (What I really
> want to call these are Golden challenges, but I haven't defined that,
> so...)  There's still a difference in kind between AIXI and a human/FAI,
> though; AIXI has *zero* reflectivity.
>
> You might say - it's really rather fascinating - that humans are
> one JOOTS
> ahead of AIXI-tl, but a Friendly AI is one JOOTS ahead of humans.
>  In that
> sense the Clone challenge and its generalizations don't just break
> AIXI-tl, but demonstrate that an infinite sequence of patches and
> breakages exist, just as humans are a step ahead of Principia Mathematica
> but a superintelligence is a step ahead of humans.  That's why the AIXI
> formalism can never be repaired back to the level of generality that was
> originally claimed for it.
>
>  > It seems like you are arguing that there are problems an embodied,
>  > experienced mind can solve better than a tabula rasa, unembodied mind.
>  > This has nothing to do with the comparison of AIXI-tl to other learning
>  > algorithms, though.
>
> It does, though.  AIXI-tl is supposed to be able to equal or outperform
> any tl-bounded mind, no matter how experienced, given time to learn.  Its
> equivalent of experience is the ability to massively simulate all
> possible
> tl-bounded experienced minds in parallel, which you must admit is an
> impressive ability to match against any amount of experience.
> Unfortunately AIXI-tl's control process does not, in fact, react to the
> challenge of dealing with another AIXI-tl by simulating Lee
> Corbin dealing
> with himself.  That it has internal mechanisms which another mind *could*
> use to do this, does not change that AIXI-tl's control process *doesn't*
> do it.  You might as well say that a naked UTM is as good as Lee Corbin;
> it "could" simulate Corbin, the question is whether it does.
>
> The task of AGI is not to see that the computers in front of us
> "could" do
> something, but to figure out what are the key differences that we must
> choose among to make them actually do it.  This holds for Friendliness as
> well.  That's why I worry when you see Friendliness in AIXI that isn't
> there.  AIXI "could" be Friendly, in the sense that it is capable of
> simulating Friendly minds; and it's possible to toss off a loose argument
> that AIXI's control process will arrive at Friendliness.  But AIXI will
> not end up being Friendly, no matter what the pattern of inputs and
> rewards.  And what I'm afraid of is that neither will Novamente.
>
> --
> Eliezer S. Yudkowsky                          http://singinst.org/
> Research Fellow, Singularity Institute for Artificial Intelligence
>
> -------