Ben Goertzel wrote:

>

>> AIXI-tl *cannot* figure this out because its control process is not

>> capable of recognizing tl-computable transforms of its own policies

>> and strategic abilities, *only* tl-computable transforms of its own

>> direct actions. Yes, it simulates entities who know this; it also

>> simulates every possible other kind of tl-bounded entity. The

>> question is whether that internal knowledge appears as an advantage

>> recognized by the control process and given AIXI-tl's formal

>> definition, it does not appear to do so.

>

> I don't understand how you're deriving the conclusion in your final

> sentence.

>

> How do you know the circumstances in which AIXItl would be led to adopt

> operating programs involving modeling its own policies and strategic

> abilities?

Because AIXI-tl is a completely specified system and it is therefore

possible to see certain bounds on its ability to model itself. It has no

*direct* reflectivity except for its memory of its own actions and its

indirect reflectivity is limited by the ability of a tl-bounded process to

simulate the conclusions of a t2^l process. (Note that under ordinary

circumstances AIXI-tl never needs this ability in order to *outperform* a

tl-bounded process; its internal tl-bounded processes will always model an

AIXI-tl as well as any tl-bounded process could.) We need to distinguish

between abstract properties of the AIXI-tl's policies that an internal

process can understand, and specific outputs of AIXI-tl that the internal

process can predict. AIXI-tl simulates all possible tl-bounded

semimeasures; some of those semimeasures will attempt to assign a

probability to sense data based on the abstract theory "You are facing

another AIXI-tl", but this abstract theory will not be enough to actually

predict that AIXI-tl's *specific* outputs in order to assign them a high

probability. The design structure of the cooperative strategy task (note

that it is not the Prisoner's Dilemna but a complex cooperation problem)

is such that each AIXI-tl will choose a different tl-bounded policy (using

t2^l operations to do so). Given that the abstract theory contained in

the tl-bounded probability semimeasure cannot access the tl-bounded policy

of either AIXI, nor itself utilize the t2^l process used to select among

policies, how is the semimeasure supposed to predict which actions the

*other* AIXI will take? Even if such semimeasures succeed in bubbling to

the top of the probability distribution, how will a tl-bounded policy in

step 4 know which policy the other AIXI selected in order to coordinate

strategies? There's no guarantee that the two policies are even

approximately the same - AIXI-tl's policy is in the same dilemna as a

human trying to work out the strategy in step 4 instead of step 2.

If you "still don't see it", could you please say *which* step in the

above reasoning first strikes you as a non sequitur?

> You may well be right that PD2 is not such a circumstance, but that

> doesn't mean there are no such circumstances, or that such

> circumstances wouldn't be common in the hypothetical life of a real

> embodies AIXItl

Breaking a universality claim only requires one counterexample. Of course

there are at least some circumstances where AIXI-tl can outperform a human!

>> AIXI-tl learns vision *instantly*. The Kolmogorov complexity of a

>> visual field is much less than its raw string, and the compact

>> representation can be computed by a tl-bounded process. It develops

>> a visual cortex on the same round it sees its first color picture.

>

> Yes, but that "visual cortex" would not be useful for anything. It

> would take some time for an embodied AIXItl to figure out how to

> recognize visual patterns in a way that was useful to it in

> coordinating its actions. Unless it had a priori knowledge to guide

> it, this would be a substantial process of trial and error learning.

Okay, we discount the early trials as part of the bounded loss. Standard

Operating Procedure for Hutter's proof.

>> Because it is physically or computationally impossible for a

>> tl-bounded program to access or internally reproduce the previously

>> computed policies or t2^l strategic ability of AIXI-tl.

>

> Yes, but why can't it learn patterns that let it approximately predict

> the strategies of AIXI-tl?

Uh... I honestly thought I just *said* why. I'll try expanding; let me
know if the expansion still doesn't help.

AIXI-tl, trying to predict itself, steadily adds more and more tl-bounded
Kolmogorov complexity to the sensory inputs it needs to predict. The true
Kolmogorov complexity of AIXI-tl never exceeds the length of the AIXI-tl
program plus the challenge computation C, which is actually pretty small
change. However, the tl-bounded Kolmogorov complexity keeps rising unless
AIXI-tl is lucky enough to stumble on a probability distribution model
which in the Secondary advises actions that confirm the probability
distribution model in the Primary when seen as inputs.

In the first round, the Secondary AIXI-tl takes a set of complex actions
A1, based on a policy P1 that was selected using an opaque
(non-tl-boundedly predictable) control procedure.

Inputs seen by Primary AIXI-tl at end of round: A1

TL-bounded Kolmogorov complexity of inputs: K(A1) < K(P1)

Real Kolmogorov complexity of inputs: K(AIXI-tl) + K(C)

In the second round, the Secondary AIXI-tl takes a set of complex actions
A2 based on a policy P2 that was selected based on a predictive model that
includes A1. The predictive model will be biased toward the simplest
semimeasures that predict A1; these semimeasures will probably strongly
resemble P1.

Inputs seen by Primary AIXI-tl at end of round: A1 A2

TL-bounded Kolmogorov complexity of inputs: K(A1) + K(A2) < K(P1) + K(P2)

Real Kolmogorov complexity of inputs: K(AIXI-tl) + K(C)

In the third round, the Secondary AIXI-tl takes a set of complex actions
A3 based on a policy P3 that was selected based on the Bayesian updating
of all possible semimeasures depending on how strongly they predicted the
input sequence A1 + A2. There is no guarantee that *any* tl-bounded
process is capable of predicting A1 + A2, except that if we are actually
simulating an entire human upload and not a simpler system, we probably
won't run out of "l" to just predict the entire flat string, ever.

However, there is no guarantee that A1 + A2 contain any compressible
regularity *from the perspective of a tl-bounded process*. Seeing A1 and
A2 as a *sequence* may require 2t^l computing power because the sequence
that computes A1 and A2 *uses* 2t^l computing power. At most, it may be
possible to determine that A1 and A2 are attempted "optimal strategies of
some kind", *if* the cooperative game is one where the optimal strategy
doesn't depend completely on the predicted actions of the other player.

Some complex cooperative games may lead to compressible regularity in the
action sequence when played by AIXI-tl in this way, but basic
no-free-lunch rules lead me to believe (pardon me; intuit) that *most*
such games will lead to a minimum of compressible regularity in the
tl-bounded perspective on a 2t^l process.

In essence, AIXI-tl goes on adding tl-incompressible complexity to the
series of actions the Secondary takes and that the Primary sees as inputs.
Even if the simplest successful semimeasures succeed in ruling out most
possible inputs as "This wouldn't be the output of an AIXI-tl", thus
eventually promoting themselves as models, they would still be unable to
predict which *actual* outputs the other AIXI-tl will use, preventing
cooperative optimization.

The worst case from AIXI-tl's perspective would probably be a cooperative
game where the optimum response is completely dependent on the other
player's action, say with a N-to-N mapping of possible Secondary actions
and optimum Primary responses (and vice versa!) If for all N possible
responses there's a standard set payoffs from 0 to R, with an easily
computable dependency on the other party's action, with a selfishly
unstable maximum cooperative payoff of 2R/3 for both parties (basic PD
condition), then my guess would be that AIXI-tl's control process *never*
succeeds in predicting the actions of its counterpart. Again, for an
uploaded human who actually remembers policies and remembers interim steps
in computation, this game is very easy to (a) recognize and (b) solve.
Just select an arbitrary pair of inputs with 2R/3 payoffs during step 2.

If AIXI's control process has the right properties... at this point my
visualization fails me, though... it might be possible to show that at
each additional step in the above game, the next step in the game will be
minimally compressible as AIXI-tl tries to outpredict itself.

> What I was saying was that ANY reinforcement learning system, if its

> only life experience is playing Prisoners Dilemma against its clone, is

> going to be unlikely to develop a generally intelligent mind.

AIXI-tl isn't really general intelligence as we know it... as I know it,
anyway... no matter which way you peer quizzically at it. It's not
supposed to be GIAWKI, it's just supposed to be better than any tl-bounded
process at anything. (Also, PD is simple enough that AIXI-tl solves it
almost immediately, albeit based on the flawed premise that it's seeing a
tl-computable transform of its own actions reflected back at it.) As for
the whole life experience thing... argh, what does this have to do with
the formalism? The question isn't whether AIXI-tl develops "general
intelligence" but whether AIXI-tl develops whatever it needs to solve the
problem. If AIXI-tl needs general intelligence but fails to develop
general intelligence to solve the complex cooperation problem, while
humans starting out with general intelligence do solve the problem, then
AIXI-tl has been broken. That's the *whole point*. Okay, your intuitions
say you could develop humanlike cooperative general intelligence in
AIXI-tl in advance by specially selecting inputs; my intuitions say
otherwise, btw, but either way it's totally irrelevant.

>> Measured in computing cycles, yes. Measured in rounds of information

>> required, no. AIXI-tl is defined to run on a very VERY fast

>> computer. Marcus Hutter has formally proved your intutions about the

>> requirement of a rich environment or prior training to be wrong; I am

>> trying to show that your intuitions about what AIXI-tl is capable of

>> learning are wrong.

>

> Again, you seem to have misunderstood my statements.

>

> Hutter did not show that a rich environment is unnecessary for

> producing a mind.

>

> For instance, suppose you take an AIXI or AIXItl system and

> systematically reward it every time it produces a "1" on its output

> screen, and punish it every time it does something else. Given this

> impoverished environment, the AIXI/AIXItl system is not going to adopt

> an operating program that is "mindlike" in any real sense. It's going

> to adopt an operating program such as "Print '1'".

>

> The Prisoners Dilemma environment is not as impoverished as the "Reward

> only printing 1" environment. But it's still pretty impoverished, and

> hence still pretty unlikely to lead to an AIXI/AIXItl system with a

> mindlike operating program.

But we aren't *talking* about whether AIXI-tl has a mindlike operating
program. We're talking about whether the physically realizable challenge,
which definitely breaks the formalism, also breaks AIXI-tl in practice.
That's what I originally stated, that's what you originally said you
didn't believe, and that's all I'm trying to demonstrate.

> Hutter's theorem says that AIXItl will equal or outperform any other

> system at getting rewarded, where rewards are determined by any

> computable function, so long as AIXItl gets a lot more resources than

> the other system. But if the reward function is too simplistic then

> AIXIt will never learn an interesting operating program...

We're not talking about how interesting AIXI-tl's probability distribution
or selected action policies are; we're talking about whether AIXI-tl
actually outperforms, regardless of how it does so internally. AIXI-tl is
a blackbox formalism.

--

Eliezer S. Yudkowsky http://singinst.org/

Research Fellow, Singularity Institute for Artificial Intelligence

-------

To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

- Re: [agi] Breaking AIXI-tl Bill Hibbard
- RE: [agi] Breaking AIXI-tl Ben Goertzel
- RE: [agi] Breaking AIXI-tl Daniel Colonnese
- RE: [agi] Breaking AIXI-tl Ben Goertzel
- RE: [agi] Breaking AIXI-tl Ben Goertzel
- Re: [agi] Breaking AIXI-tl Eliezer S. Yudkowsky
- RE: [agi] Breaking AIXI-tl Ben Goertzel
- Re: [agi] Breaking AIXI-tl Eliezer S. Yudkowsky
- Re: [agi] Breaking AIXI-tl Michael Roy Ames
- RE: [agi] Breaking AIXI-tl Ben Goertzel
- RE: [agi] Breaking AIXI-tl Eliezer S. Yudkowsky
- RE: [agi] Breaking AIXI-tl Ben Goertzel
- Re: [agi] Breaking AIXI-tl Eliezer S. Yudkowsky
- Re: [agi] Breaking AIXI-tl Eliezer S. Yudkowsky
- Re: [agi] Breaking AIXI-tl Philip Sutton
- RE: [agi] Breaking AIXI-tl Ben Goertzel
- RE: [agi] Breaking AIXI-tl Ben Goertzel
- Re: [agi] Breaking AIXI-tl Eliezer S. Yudkowsky
- RE: [agi] Breaking AIXI-tl Ben Goertzel
- Re: [agi] Breaking AIXI-tl Eliezer S. Yudkowsky
- RE: [agi] Breaking AIXI-tl Ben Goertzel