Hi Ed, So is the real significance of the universal prior, not its probability > value given in a given probability space (which seems relatively > unimportant, provided is not one or close to zero), but rather the fact that > it can model almost any kind of probability space? >
It just takes a binary string as input. If you can express your problem as one in which a binary string represents what has been observed so far, and a continuation of this string represents what happens next, then Solomonoff induction can deal with it. So you don't have to "pick the space". You do however have to take your problem and represent it as binary data and feed it in, just as you do when you put any kind of data into a computer. The power of the universal prior comes from the fact that it takes all computable distributions into account. In a sense it contains all well defined hypotheses about what the structure in the string could be. This is a point that is worth contemplating for awhile. If there is any structure in there and this structure can be described by a program on a computer, even a probabilistic one, then it's already factored into the universal prior and the Solomonoff predictor is already taking it into account. How does the Kolmogorov complexity help deal with this problem? > The key thing that Kolmogorov complexity provides is that it assigns a weighting to each hypothesis in the universal prior that is inversely proportional to the complexity of the hypothesis. This means that the Solomonoff predictor respects, in some sense, the principle of Occam's razor. That is, a priori, simpler things are considered more likely than complex ones. ED######> ??Shane??, what are the major ways programs are used in a > Solomonoff machine? Are they used for generating and matching patterns? Are > they used for generating and creating context specific instantiations of > behavioral patterns? > Keep in mind that Solomonoff induction is not computable. It is not an algorithm. The role that programs play is that they are used to "construct" the universal prior. Once this is done, the Solomonoff predictor just takes the prior and conditions on the observed string so far to work out the distribution over the next bit. That's all. >Lukasz######> The programs are generally required to exactly match in AIXI > (but not in AIXItl I think). > ED######> ??Shane??, could you please give us an assist on this one? Is > exact matching required? And if so, is this something that could be > loosened in a real machine? > Exact pattern matching is required in the sense that if a hypothesis says that something cannot happen, and it does, then that hypothesis is effectively discarded. A real machine might have to loosen this, and many other things. Note that nobody I know is trying to build a real AGI machine based on Solomonoff's model. Isn't there a large similarity between a Solomonoff machine that could learn > a hierarchy of pattern representing programs and Jeff Hawking's hierarchical > learning (as represented in the Serre paper). One could consider the > patterns at each level of the higherarchy as sub-routines. The system is > designed to increase its representational efficiency by having > representational subroutines available for use by multiple different > patterns at higher compositional levels. To the extent that a MOSES-type > evolutionary system could be set to work making such representations more > compact, it would become clear how semi-Solomonoff machines could be made to > work in the practical world. > In think the point is that if you can do really really good general sequence prediction (via something impractical like Solomonoff induction, or practical like the cortex) then you're a long way towards being able to build a pretty impressive AGI. Some of Hutter's students are interested in the latter. > The def of Solomonoff induction on the web and even in Shane Legg's paper > "Solomonoff induction" make it sound like it is merely Bayesian induction, > using the picking of priors based on Kolmogorov complexity. > Yes, that's all it is. But statements made by Shane and Lukasz appears to imply that a Solomonoff > machine uses programming and programming size as a tool for pattern > representation, generalization, learning, inference, and more. > All these programs are weighted into that universal prior. > So I think (but I could well be wrong) I know what that means. > Unfortunately I am a little fuzzy about whether NCD would take "what" > information, "what-with-what" or binding information, or frequency > information sufficiently into account to be an optimal measure of > similarity. Is this correct? > NCD is just a computable approximation. The universal similarity metric (in the Li and Vitanyi book that I cited) gives the pure incomputable version. The pure version basically takes all effective similarity metrics into account when working out how similar two things are. So if you have some concept of similarity that you're interested in that can be programmed, it's already factoring this in. Cheers, Shane ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244&id_secret=63958284-e6bb79
