On Fri, Jun 03, 2005 at 04:22:07PM -0700, "Hal Finney" wrote: > Russell Standish recently mentioned his paper "Why Occam's Razor" which > can be found at http://parallel.hpc.unsw.edu.au/rks/docs/occam/ . Among > other things he aims to derive quantum mechanics from a Schmidhuber type > ensemble. I have tried to read this paper but never really understood it. > Here I will try to ask some questions, taking it slowly. > > On this page, http://parallel.hpc.unsw.edu.au/rks/docs/occam/node2.html , > things get started. Russell describes a set of infinite bit strings he > calls "descriptions". He writes: > > "By contrast to Schmidhuber, I assume a uniform measure over these > descriptions -- no particular string is more likely than any other." > > This surprises me. I thought that Schmidhuber assumed a uniform measure > over bit strings considered as programs for his universal computer. So > what is the "contrast" to his work?
Nowhere in Schmidhuber (1997) does he propose a measure over the "input programs". What he does is is justify the appearance of a universal prior in the set of descriptions by passing the raw data through a reference UTM. Presumably the set of descriptions is sampled uniformly by the observer. In Schmidhuber (2000), the set of descriptions is generated by a machine which has resource bounds. This leads to the notion of "speed prior" which differs from the universal prior in several important respects. I sometimes refer to the two different ensembles as Schmidhuber I and Schmidhuber II. I am beginning to regret calling the all descriptions ensemble with uniform measure a Schmidhuber ensemble. I think what I meant was that it could be generated by a standard dovetailer algorithm, running for 2^\aleph_0 timesteps. However, as the cardinality of "my" ensemble is actually "c" (cardinality of the real numbers), it is quite probably a completely different beast. It is also not generated by a program, Schmidhuber style, it simply is (in the sense of being the simplest set - equivalent to nothing). > > It seems that the greater contrast is that while Schmidhuber assumed that > the bit strings would be fed into a computer that would produce outputs, > Russell is taking the bit strings directly as raw data. > Quite true. > But I am confused about their role. > > "Since some of these descriptions describe self aware substructures..." > > Whoa! This is a big leap for me. First, I am not too happy that mere bit > strings have been elevated with the title "descriptions". A bit string on > its own doesn't seem to have the inherent meaning necessary for it to be > considered a description. Many apologies for deploying terminology in a different way to you expect. A description (in my terminology) does not necessarily have meaning. It is simply data. This is in accord with how I use the term in casual English usage too - a description is simply a string of letters, and may or may not be meaningful. Meaning is attached by an observer. Now an observer will expect to find a SAS in one of the descriptions as a corrolory of the anthropic principle, which is explicitly stated as one of the assumptions in this work. I make no bones about this - I consider the anthropic principle a mystery, not self-evident like many people. Why should an observer expect to see a token of erself embedded in reality? That is the mystery of the AP. > And now we find not only that the bit string is > a description, but it is a complex enough description to describe SAS's? > How does that work? > The bitstrings are infinite in length. By reading enough bits, they can have arbitrarily complex meanings attached to them. > It's especially confusing to read the introductory word "since" as though > this is all quite obvious and need not be explained. To me it is very > confusing. > Sorry for not going slow enough. The habits of concise expression are hard to shake. > The page goes on to identify these SAS's as observers. Now they are mappings, > or equivalently Turing Machines, which map finite bit strings to integers. > These integers are the "meanings" of the bit strings. Not equivalently. Not all maps can be represented by a Turing machine, only computable ones. > > I believe the idea here is that the bit strings are taken as prefixes > of the "description" bit strings in the ensemble. It is as though the > "observers" are observing the "descriptions" a bit at a time, and mapping > them to a sequence of integer "meanings". Is that correct? > Indeed that is one interpretation. The most important point is that the observer map is a "prefix" map, in the sense of prefix machines of Algorithmic Information Theory. In reading a bit string one bit at a time, once a meaning is attached to the string, that is the meaning for evermore - the observer cannot change er mind after reading a few more bits. Schmidhuber (2000) deals with machines that do "change their mind", so perhaps there is some extension possible in this direction. > So here is another confusion about the role of the "description" bit > strings in the model. Are they things that "observer" TM's observe and > map to integers? Or are they places where observers live, as suggested > by the "Since" line quoted above? Or both? > All that is discussed in this paper is appearances - we only try to explain the phenomenon (things as they appear). No attempt is made to explain the noumenon (things as they are), nor do we need to assume that there is a noumenon. Bruno Marchal has a detailed discussion on this in his thesis, and concludes that he "has no need for this hypothesis" (what he calls the extravagant hypothesis). So the former statement is true : things that "observer" TM's observe and map to integers. It is also true that descriptions of self aware observers will appear within the description by the Anthropic Principle. The phenomenon of observerhood is included. However where the observers actually live is not a meaningful question in this framework. > Now it gets a little more complicated: "Under the mapping O(x), some > descriptions encode for identical meanings as other descriptions, so > one should equivalence class the descriptions." > > The problem I have is, O takes only finite bit strings. Only a finite number of bits are significant. They still operate on infinite bitstrings. > So technically > a "description", which is an infinite bit string, does not encode a > meaning. Yes. > What I think is meant here, though, is that two "descriptions" > (i.e. infinite bit strings) will be considered equivalent if for every > finite prefix of the strings, the O() mapping is the same. Yes > So if we > think of O as "observing" the "description" bit strings one by one, > it will go through precisely the same sequence of integer "meanings" > in each case. Is that right? > I think you're saying that O(x) has unique value for any given value of x. In which case you're right. > "In particular, strings where the bits after some bit number n are > ``don't care'' bits, are in fact equivalence classes of all strings that > share the first n bits in common." > Sets of strings corresponding to a given meaning are an equivalence class of descriptions, yes. > I think what this considers is a special O() and a special string prefix > such that if O sees that particular n-bit prefix, all extensions of > that prefix get mapped to the same "meaning" integer. In that case the > condition described in my previous paragraph would be met, and all > strings with this n-bit prefix would be equivalent. > yes. This is the condition that O(x) is a prefix map. > "One can see that the size of the equivalence class drops off > exponentially with the amount of information encoded by the string." > > That seems a little questionable because the size of the equivalence class > is infinite in all cases. However I think Russell means to use a uniform > measure where the collection of all strings with a particular n-bit > prefix have a measure of 1/2^n. It's not clear how well this measure > really works or whether it applies to all sets of infinite strings. > Maybe I'm being sloppy. Size here means measure, not cardinality. I can appreciate your confusion. The uniform measure works, because the set of all descriptions is one-to-one mappable to the real interval [0,1], except for a set of points of measure zero (rational numbers with power of 2 denominators) where 2 descriptions map to the same real number. Just as the uniform measure works with the reals (and allows integral calculus to work), the uniform measure works with descriptions. The measure of equivalence classes O^{-1}(n) also follows in a straight forward fashion from the uniform measure. > "Under O(x), the amount of information is not necessarily equal to the > length of the string, as some of the bits may be redundant." > > Now we have this new concept of "the amount of information" which has > not previously been defined. This sentence is really hard for me. > What does it mean for bits to be redundant? This sentence is part of the hand-waving "furniture" - its meant to make things easier for the reader, not harder. Most people have an intuitive notion of what redundancy means. Please ignore it if it makes you feel better. Later on I do define amount of information, or complexity to use its proper name as {\cal C}_O(x) = -\log_2 P_O(O(x)), and appeal to the coding theorem as a way of relating it to Kolmogorov complexity K(x). > We just discussed strings > where all those after bit n are "don't care", but this sentence seems > to be envisioning other kinds of redundancies. > Indeed. > "The sum P_O(s) = [sum over p such that O(p)=s of] 2^(-|p|) > where |p| means the number of bits of p consumed by O in returning s, > gives the size of the equivalence class of all descriptions having > meaning s." > > Boy, that's a tough one now. We consider all bit strings p such that O(p) > = s. Now, is this supposed to just be those cases described earlier where > the bits after |p| are "don't care" bits? Or is it all strings p such > that O(p) = s, including all extensions of a string where we "don't care" > after some point? If the latter, it seems problematical because there > would be an uncountably infinite number of such p's. Also I think the sum > would not converge in that case. So I think it is probably the former. > You have indeed hoisted me here. Well done! Of course the mistake is not really serious, and can be patched up by taking the former meaning you suggest, or merely to note that the measure of the set with with common prefix of length |p| is precisely 2^{-|p|}, and then the sum is over all such subsets of descriptions. > But then, what if there is a string p such that O(p)=s, but if we append > anything to p, either p+0 or p+1, then O of those strings is not s? > Does that p add to the equivalence class or not? > Remember O(x) is a prefix map. That cannot happen. > And is it the case that for all O and for every "description" bit string, > there will be some prefix beyond which all the bits are "don't care"? > Would that follow from the finiteness of O? If not, how do these > "descriptions" feed into the measure? > If there are no "don't care" bits, the description contributes zero measure. This is not a problem, so long as there is at least one class of strings with don't care bits. > And I'm a little concerned about the appearance of 2^(-|p|) here without > much motivation. Perhaps making the summation index p run over subsets rather than descriptions would help here, as I men tioned above. The subsets correpond to finite length bitstrings (the "prefix"), and the |p| corresponds to the length of that prefix. > > "The quantity C_O(x) = - log2( P_O( O(x) ) ) > is a measure of the information content, or complexity of a description x." > > Now I am becoming confused about whether O() is supposed to apply > to the infinite bit strings that are "descriptions" or only to their > finite prefixes. I thought it was the latter. I don't see how a TM > can come up with a meaningful value on an infinite bit string. It can > never read in the whole input if it is infinite in size. > It does not need to. > "If only the first n bits of the string are significant, with no > redundancy, then it is easy to see C_O(x)=n." > > That part does make sense, if in fact the definition of P_O(s) was > only with reference to "description" strings which were "don't care" > after some number of bits. And I think the part about "no redundancy" > means there are no other bit string prefixes which O maps to the same > value as it maps the n-bit prefix of x. > > The page then goes on to make some comments about measure applied to > universes. Here again I am confused about how to relate it to all that > has been descibed. What are the analogs of universes, in this model? > Is it "descriptions", the infinite bit strings? From what has been > presented so far, I don't understand how to relate our experience of > reality to this model. > Each description is a possible universe, composed of an infinite amount of information. Any observer will of course only comprehend a finite amount information, and hence be in a superposition of a subset of universes corresponding to that finite information. Admittedly the usage of the term "universe" is slightly strange here. Alterantively, one could talk about "observer moments" as corresponding to the equivalence classes of descriptions. This interpretation would be more natural to many here on this list. In section 3 of the paper, I now introduce a temporal dimension, with the observer repeatedly sampling the set of all descriptions, with the proviso that successor states can only differ by a finite number of bits. > That's it for the first page. Hopefully these questions will help to > show where I am getting confused. > > Hal Finney Cheers -- *PS: A number of people ask me about the attachment to my email, which is of type "application/pgp-signature". Don't worry, it is not a virus. It is an electronic signature, that may be used to verify this email came from me if you have PGP or GPG installed. Otherwise, you may safely ignore this attachment. ---------------------------------------------------------------------------- A/Prof Russell Standish Phone 8308 3119 (mobile) Mathematics 0425 253119 (") UNSW SYDNEY 2052 [EMAIL PROTECTED] Australia http://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02 ----------------------------------------------------------------------------
pgpmq9rHN7Y4P.pgp
Description: PGP signature