I want to describe in more detail how I see the Universal Distribution (UDist) applying to the measure of observers and observer moments. I apologize in advance for the length of this message; someday I will collect this and my others on this topic into a set of web pages.
To briefly reiterate, in this model every information pattern or object is said to exist in a Platonic sense. These are the only things that exist. Further, these objects are associated with a measure defined by the Universal Distribution or UDist, which is defined with respect to a given Universal Turing Machine (UTM). Basically the measure of an object is the probability that a random program run through that UTM produces that object. Another way to think of the measure is as the fraction of all programs which produce that object. You can also imagine an infinite array of UTMs, each one working on a distinct input program and producing outputs, where the measure of an object is proportional to the fraction of the outputs match that object pattern. To apply this concept to observers, we first need to think of an observer as an information pattern. I adopt a block universe perspective and think of time as a dimension. Then we can see the dynamic activity that is part of an observer's thinking as producing a pattern in space and time. Let us consider the human brain. Our current theories are that thinking and perception can be thought of as reflecting the activity of neural cells. These cells fire off electrical impulses at various rates and times, and are hooked up into an elaborate and complex network of interconnections, whose properties change over time. It seems reasonable that a complete record of the neural activity and dynamics of the interconnection map over a period of time would be an effective and accurate representation of the information associated with the mental activity during that time period. In fact this is probably more detailed than is needed; clearly much neural processing is unconscious, and further, small variations in individual cell activity will not produce perceptual changes in consciousness. But at this point we cannot be certain about what parts could be simplified or eliminated. No doubt with future study of brain function we will gradually gain a greater understanding of consciousness and we could reduce the information content needed to fully specify an observer's experiences even further. It is an instructive exercise to try to estimate how much information is needed to specify this representation of an observer as it exists within some window of time. Obviously this will be very approximate. I'll take 1E10 (10^10 or 10 billion) as the number of neurons and 1E5 (10 thousand) as the number of connections per neuron. Representing this map takes 1E10 * 1E5 * 33 bits per connection or about 3E16 bits. Then we need to specify the connection strength for each connection; perhaps 20 bits is about right, which gives 1 part in a million accuracy. This gets us up to 6E17 bits. This is a static representation. As the brain thinks, these connection strengths change. My understanding is that they don't change very fast. Some estimates I have seen suggest that long term memory can only record 1 bit per second. I know that sounds amazingly slow but it reflects how much we forget. Even if this is low by 10 orders of magnitude it still means that the information needed to record changes in the interconnection map is insignificant next to the information needed to specify the map in the first place, which I estimated as 6E17 bits. So I will not count this information contribution, not because it doesn't happen, but because it is so small compared to the information in the static network. Next we need to record the actual neural firing patterns. Neurons fire at a maximum rate of about 1000/sec, but we need to record more than the firing rates. Relative timing of the neurons is important as well, both because the brain seems to record information in that timing, but also because whether a neuron fires or not depends on the relative timing of the various impulses it receives. Let's suppose we record the neural firing activity at the microsecond level; not just a boolean value about whether it fired, but some indication of the neuron's level of activation and its recovery rate. We will use a 20 bit value to represent this, and record it every microsecond. This takes 1E10 * 20 * 1E6 bits, times the number of seconds, or 2E17 times the number of seconds. The bottom line from all this estimation, which is obviously very very rough, is that it takes (6+2s) times 10^17 bits to record the pattern associated with s seconds of brain activity, in enough detail that we could plausibly claim to have fully captured the essence of those seconds of consciousness. That's 10^18 bits for 2 seconds of consciousness. 10^18 bits is about 100 million gigabytes. For comparison, typical new computer disks today are about 100-500 gigabytes. So it is a pretty big chunk of data. What is the point of this calculation? Effectively, it gives us a lower bound on the measure of a possible set of observer moments. Even if nothing else works, we could write a program which would output this particular pattern simply by embedding that pattern in the program itself. We could do the UTM equivalent of "print 1001011110010..." to print out whatever pattern we want. The size of that program will be roughly the size of the information pattern itself, in this case roughly 10^18 bits. And the measure of that information pattern will be 1/2 to this power, or 1/2^(10^18). This is an astronomically low measure. In effect, this measure estimates the probability of a given moment of consciousness appearing purely at random, out of random noise. This answers the question sometimes posed of whether the random vibrations in the atoms of air or a rock crystal are conscious, since we could select a subset of them and match them against the information pattern of consciousness described above. The answer is that the only program which could do so would embed the entire information pattern within it, and the measure of such a program is the tiny figure quoted above. Assuming that there are non-infinitesimal sources of measure for a given consciousness, then the contribution from these other sources is undetectable. So what other sources of measure might there be? In other words, how could we write a much shorter program which could output this same information pattern? One thing we could do is to try to compress the redundancy out of the pattern. As I have defined it, even though I tried not to be terribly excessive in the number of bits I allocated to the various terms, there is certainly some redundancy. For one thing, I proposed to record each neuron's firing pattern independently, even though in principle a neuron's firing should be able to be computed from its inputs and the interconnection matrix, both of which are recorded. In that case we might get by with only storing the inputs to the brain from the perceptual nerves, and use the stored interconnection matrix to compute everything else. This could greatly reduce the per-second information requirement. However I don't think this will work as well as it sounds, because I suspect that the brain is chaotic in nature and so small changes in initial conditions will lead ultimately to totally different output. Even though I stored the interconnections with a precision of 1 in a million, that may not be enough to keep the reconstructed brain pattern identical to the recorded one. Likewise, my failure to store the changes in brain interconnections will probably not be a successful strategy if they have to be kept accurate in order to keep the simulation in sync with the reality. I would not be surprised if ceasing to record the precise internal firing patterns, and trying to reconstruct them, will require additional information to be stored which will largely eliminate the savings. And besides, at least for relatively brief observer moments of less than a few seconds, the interconnection network took more room than the neural firing patterns, so you aren't going to save that much. No, we need a much more radical strategy to shrink the program. And here is where it is so remarkable that instead of 10^18 bits, or even perhaps 10^14 bits if we could compress the data by a factor of 1000, I estimate that only about 10^5 bits would be enough to encode this information, perhaps even 10^4. The contribution to measure from such a short program is so much larger than the "brute force" 10^18 version that the latter can be completely neglected. So how do we record this brain pattern in so little data? The answer is that we adopt a completely different approach. Instead of specifying all of the information, we instead specify the natural laws and initial conditions for a universe which is suitable for the evolution of life and intelligence. The program runs and creates that universe, and then outputs the brain pattern which one of the observers in that universe is experiencing. I will show below that the information content of such a program is plausibly of the order I claimed. This requires an implicit assumption that the brain pattern in question can in fact be produced by an observer who evolved naturally and is experiencing the events in a plausible universe. In other words, this will not work for all brain patterns that we could ever imagine. But if our own brain patterns represent actual experiences in a real universe, a universe which is not too implausible, then it should work for them. So I will assume that this is the case. The program to output the brain pattern can be conceptually divided into two parts. One part computes the universe, and the other outputs the observer moments in question, in the format described above, as an interconnection matrix and record of neural firings. We can estimate the size of the program by the sum of the estimated sizes of the two parts. How big a program is needed to create our universe? Nobody knows, but the answer seems to be, not that big. One of the most striking properties of the natural laws which have been discovered so far is that they are mathematically simple. Explaining why this is so is considered a major philosophical puzzle, and answering this question is one of the biggest successes of the UDist principle. We don't yet have a complete theory of the laws of physics, but given what is known, quantum theory and relativity, and prospective new models like string theory and loop quantum gravity, it certainly doesn't appear that they will be very big. Wolfram gave an estimate that the universe could be completely modelled using 5 lines of Mathematica code. Now, Mathematica has an extensive mathematical library but I doubt that most of it is used. 5 lines of code times 70 characters per line times 8 bits per character is about 3E3 bits. Let's assume that the math functions triple the size and we get about 1E4 bits. We also need to specify the initial conditions, but there again the information content seems to be low. I quoted Tegmark last week that the information content of the big bang is "close to zero". I don't know how to translate this into a specific number of bits but I will assume that it is substantially less than 1E4 bits and not increase the estimate so far. So we have it that a program of about 1E4, 10 thousand, bits should be able to re-create our universe in all its glory, including observers like ourselves (in fact, exactly like ourselves). That is our estimate for the size of the first part of the program which will output the information pattern for a given sequence of observer moments. For the size of the second part, we are given the output of the first part, a complete representation of the state of nature at every point and every instant in the universe, and we need to output a nicely ordered data structure representing an observer's mental activity as described above. There are several problems here. One is that the universe is very big, and potentially has many observers who have many observer-moments. We need to select a particular starting moment of a particular observer so as to output the pattern above. A related problem is just identifying where the observers are in that vast expanse of space-time. Another problem is that the "natural" way of expressing the universe's state as output by the first program may not be very similar to what we perceive as the universe around us. It appears, based on our understanding of physics, that it is going to be expressed as a theory of what we understand as the very small, at the Planck scale, 1E-35 meters and 1E-44 seconds. It is likely, then, that the output of the universe program is going to be expressed at that scale. From that perspective, the activity of a neuron is both enormous and insubstantial, effectively just a very large scale averaging of the much more dynamic activity at this, the natural scale of physics. This last problem, the difference in scale, can probably be dealt with very simply. It may be enough simply to average the state over a large region in order to get a good, macroscopic (in this context, meaning at the scale of atoms and molecules) view of what is going on; or it may be that subatomic particles like electrons and quarks will be clearly represented even at the Planck scale and we can use them to identify atoms and molecules at the larger scale. The next problem is to locate and localize the observer within the entire framework of spacetime. The way I envision this being done is that the position of the observer's brain in space and time is hard-coded into the program. The position can be expressed as a fraction of the size of the universe, to the required resolution. I would think that for a program which is going to analyze neural structure, nanometer resolution is enough to localize the brain. The typical synaptic gap between neurons is about 10-20 nm. In terms of time, since I sought microsecond resolution for the neural state, localizing the observer to microsecond resolution should be adequate. The size of the universe is unknown, but let us for convenience work with the size of the visible universe, about 3E10 (30 billion) light years. This is 3E35 nanometers. It takes about 118 bits to represent a number of that size, and we have 3 dimensions, so it takes about 350 bits to fully localize a brain in space the size of the visible universe. In terms of time, the universe is about 1E10 (10 billion) years old, which is 3E23 microseconds. 80 bits is enough for that. Putting these together, 350+80 or about 430 bits will give us a starting point accurate to a microsecond and a nanometer for producing a description of a brain. The actual analysis software should be straightforward. We need to locate all the neurons, record their interconnection patterns, and then their firing rates and activity levels. All of these have relatively simple physical correlates given that you can analyze matter at an arbitrarily fine scale. Locating the neurons can be done by tracing their outer membranes. The interconnection patterns should be determined by the amount of area they have in common, the number and distribution of vessicles and receptors in the area, and basic chemistry as to whether the connection is inhibitory or excitatory. This is a matter of simple geometry and counting. Likewise, the activity level is a function of the concentration of various chemicals inside vs outside the neural membrane and can be calculated very simply. This level of software is adequate to create the data structure defined above for completely specifying the neural activity which corresponds to a given set of observer moments. It amounts to simple counting, area calculation, and averaging. My guess again is that 10^4 to 10^5 bits is fully adequate to perform these tasks. Adding the < 10^3 bits needed to localize the observer still keeps it within this range. Combining the software to create the universe, perhaps 10^4 bits, and the software to output the observer description, about 10^4 to 10^5 bits, we get the size proposed above, 10^4 to 10^5 bits for a self-contained program which will output the observer description in question. On this basis we can use a number like 1/2^(10^4) as an estimate for the measure of such a set of observer moments. Hopefully this explanation will clarify how we can apply the UDist model to calculate measure of observer moments as well as other information structures. It also illustrates how far we are from the scientific knowledge necessary to come up with more precise estimates for the information content of conscious entities. Nevertheless, even with the crude level of knowledge available today, we can make many powerful predictions from this kind of model. One case described above is the paradox of whether conscious entities exist all around us due to vibrations in air molecules, which this analysis lets us reject in a quantitative sense. Hans Moravec in particular has argued that such entities have a reality equal to our own, which is clearly wrong. A similar analysis disposes of the long standing philosophical debate over whether a clock implements every finite state machine (and hence every conscious entity). Other puzzles, such as the impact on measure of replays and duplicates can also be addressed and solved in this framework. I have described other predictions and solutions in my earlier messages on this topic. Again, I hope that by laying out my calculations in this much detail it will help people to see somewhat concretely how the Universal Distribution works and how you can analyze measure using actual software engineering concepts. It makes the UDist much more real as a useful tool for understanding measure and making predictions. Hal Finney

