On Mon, Apr 28, 2014 at 9:48 AM, Alex Miller <[email protected]> wrote:
> (Please let me know if this isn't the best forum for this kind of question) > If you have technical questions about AIXI, you may get better answers on the dedicated mailing list: https://groups.google.com/forum/?fromgroups#!forum/magic-list which covers mathematical AGI and AIXI and Solomonoff Induction in particular. > Suppose we give a long sample from the Bernoulli distribution to AIXI. > I would expect an intelligent agent to figure out that it cannot see > any patterns there and be able to continue the sequence accordingly. > > However, I think the shortest programs that reproduce such a sequence > (the ones that should dominate the hypothesis space, according to > AIXI) be the ones that copy the given sequence and then produce > something highly regular, e.g. all zeros. Is this correct? > Let's consider Solomonoff Induction (SI) which is at the heart of AIXI, and is made just for prediction (AIXI is made for choosing actions optimally). SI does not consider only the simplest deterministic program, otherwise it would have the behavior you predict, which is not good enough. What you describe is called Minimum Description Length, which is a restriction of SI and is mainstream in machine learning. Instead, SI considers *all* computable programs that are consistent with the observed sequence. Since a coin-flip sequence has no particular (deterministic) structure, some of the consistent programs will predict that the next bit is 0, and some will predict that it is 1. Now, since SI makes a mixture (weighted sum) of the predictions of all these consistent programs, it will in general predict 0 with probability more or less 1/2. It's not a perfect 0.5 though because there are fluctuations in the set of consistent programs, so sometimes it will be more than 0.5, sometimes less. (In particular if there has been more 0s than 1s in the past, 0 will have in general a higher probability.) A bit more details: Say we are at time T and the current observed sequence generated by the coin-flip process is x=010010101011. For a program p to be consistent with x, the execution of p must produce the sequence x, and possibly more. Also, the weight (in the mixture) of the program p is 2^(-length(p)), so the weight is lower if the length (in bits) of p is larger. Now, consider the program p0 that outputs always 0. This program is not consistent with the x above, so it is not considered in the mixture. However, we can make a specific program p0' that outputs exactly x, and then outputs always 0. Since there is no structure in x (we say that x is random), the best we can do to output x is the program "print 010010101011". So the length of p0' is roughly equal to the length of x (which is T) plus the length of "then print 0 forever". p0' is consistent with x, but its weight is small compared to p0. It's actually roughly 2^-T smaller. Now, if the next bit, at time T+1, that we receive from the coin-flip generator is 1, p0' becomes inconsistent and is removed from the mixture. But again, there is a program p0'' that is like p0' but outputs 1 instead of 0 for time T+1. But the weight of p0'' is smaller than that of p0' (actually it's roughly 2^-1=1/2 of p0') because it looks like "print 010010101011*1* then print 0 forever". You can easily imagine programs p1, p1', p1'', etc. that output 1 instead of 0. Their weight is very close (if not equal) to those of p0, p0' and p0'', and this is partly why 0 and 1 will have roughly probability 1/2 each. There is actually an /equivalent/ definition of SI in terms of *stochastic* programs (instead of deterministic programs), which class contains the Bernoulli distribution (and all computable distributions actually). It is easy to show that SI converges to optimal prediction after K(q) prediction errors, where K(q) is the size of the smallest program that is equivalent to q (which is very small if the sequence to predict is not random!), even if the generator is stochastic. It is not easy to grok why the deterministic and stochastic definitions are equivalent (the proof is hard to read), but hopefully the example above gives you some insight. If you want more details on Solomonoff Induction and AIXI, I recommend Shane Legg's thesis, which is a nice introduction: http://www.vetta.org/documents/Machine_Super_Intelligence.pdf Hope this helps, Laurent ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
