Russell Standish recently mentioned his paper "Why Occam's Razor" which can be found at http://parallel.hpc.unsw.edu.au/rks/docs/occam/ . Among other things he aims to derive quantum mechanics from a Schmidhuber type ensemble. I have tried to read this paper but never really understood it. Here I will try to ask some questions, taking it slowly.
On this page, http://parallel.hpc.unsw.edu.au/rks/docs/occam/node2.html , things get started. Russell describes a set of infinite bit strings he calls "descriptions". He writes: "By contrast to Schmidhuber, I assume a uniform measure over these descriptions -- no particular string is more likely than any other." This surprises me. I thought that Schmidhuber assumed a uniform measure over bit strings considered as programs for his universal computer. So what is the "contrast" to his work? It seems that the greater contrast is that while Schmidhuber assumed that the bit strings would be fed into a computer that would produce outputs, Russell is taking the bit strings directly as raw data. But I am confused about their role. "Since some of these descriptions describe self aware substructures..." Whoa! This is a big leap for me. First, I am not too happy that mere bit strings have been elevated with the title "descriptions". A bit string on its own doesn't seem to have the inherent meaning necessary for it to be considered a description. And now we find not only that the bit string is a description, but it is a complex enough description to describe SAS's? How does that work? It's especially confusing to read the introductory word "since" as though this is all quite obvious and need not be explained. To me it is very confusing. The page goes on to identify these SAS's as observers. Now they are mappings, or equivalently Turing Machines, which map finite bit strings to integers. These integers are the "meanings" of the bit strings. I believe the idea here is that the bit strings are taken as prefixes of the "description" bit strings in the ensemble. It is as though the "observers" are observing the "descriptions" a bit at a time, and mapping them to a sequence of integer "meanings". Is that correct? So here is another confusion about the role of the "description" bit strings in the model. Are they things that "observer" TM's observe and map to integers? Or are they places where observers live, as suggested by the "Since" line quoted above? Or both? Now it gets a little more complicated: "Under the mapping O(x), some descriptions encode for identical meanings as other descriptions, so one should equivalence class the descriptions." The problem I have is, O takes only finite bit strings. So technically a "description", which is an infinite bit string, does not encode a meaning. What I think is meant here, though, is that two "descriptions" (i.e. infinite bit strings) will be considered equivalent if for every finite prefix of the strings, the O() mapping is the same. So if we think of O as "observing" the "description" bit strings one by one, it will go through precisely the same sequence of integer "meanings" in each case. Is that right? "In particular, strings where the bits after some bit number n are ``don't care'' bits, are in fact equivalence classes of all strings that share the first n bits in common." I think what this considers is a special O() and a special string prefix such that if O sees that particular n-bit prefix, all extensions of that prefix get mapped to the same "meaning" integer. In that case the condition described in my previous paragraph would be met, and all strings with this n-bit prefix would be equivalent. "One can see that the size of the equivalence class drops off exponentially with the amount of information encoded by the string." That seems a little questionable because the size of the equivalence class is infinite in all cases. However I think Russell means to use a uniform measure where the collection of all strings with a particular n-bit prefix have a measure of 1/2^n. It's not clear how well this measure really works or whether it applies to all sets of infinite strings. "Under O(x), the amount of information is not necessarily equal to the length of the string, as some of the bits may be redundant." Now we have this new concept of "the amount of information" which has not previously been defined. This sentence is really hard for me. What does it mean for bits to be redundant? We just discussed strings where all those after bit n are "don't care", but this sentence seems to be envisioning other kinds of redundancies. "The sum P_O(s) = [sum over p such that O(p)=s of] 2^(-|p|) where |p| means the number of bits of p consumed by O in returning s, gives the size of the equivalence class of all descriptions having meaning s." Boy, that's a tough one now. We consider all bit strings p such that O(p) = s. Now, is this supposed to just be those cases described earlier where the bits after |p| are "don't care" bits? Or is it all strings p such that O(p) = s, including all extensions of a string where we "don't care" after some point? If the latter, it seems problematical because there would be an uncountably infinite number of such p's. Also I think the sum would not converge in that case. So I think it is probably the former. But then, what if there is a string p such that O(p)=s, but if we append anything to p, either p+0 or p+1, then O of those strings is not s? Does that p add to the equivalence class or not? And is it the case that for all O and for every "description" bit string, there will be some prefix beyond which all the bits are "don't care"? Would that follow from the finiteness of O? If not, how do these "descriptions" feed into the measure? And I'm a little concerned about the appearance of 2^(-|p|) here without much motivation. "The quantity C_O(x) = - log2( P_O( O(x) ) ) is a measure of the information content, or complexity of a description x." Now I am becoming confused about whether O() is supposed to apply to the infinite bit strings that are "descriptions" or only to their finite prefixes. I thought it was the latter. I don't see how a TM can come up with a meaningful value on an infinite bit string. It can never read in the whole input if it is infinite in size. "If only the first n bits of the string are significant, with no redundancy, then it is easy to see C_O(x)=n." That part does make sense, if in fact the definition of P_O(s) was only with reference to "description" strings which were "don't care" after some number of bits. And I think the part about "no redundancy" means there are no other bit string prefixes which O maps to the same value as it maps the n-bit prefix of x. The page then goes on to make some comments about measure applied to universes. Here again I am confused about how to relate it to all that has been descibed. What are the analogs of universes, in this model? Is it "descriptions", the infinite bit strings? From what has been presented so far, I don't understand how to relate our experience of reality to this model. That's it for the first page. Hopefully these questions will help to show where I am getting confused. Hal Finney