Re: [agi] Comments On My Skepticism of Solomonoff Induction

Matt Mahoney Fri, 06 Aug 2010 12:39:27 -0700

Jim, see http://www.scholarpedia.org/article/Algorithmic_probability
I think this answers your questions.


 -- Matt Mahoney, matmaho...@yahoo.com




________________________________
From: Jim Bromer <jimbro...@gmail.com>
To: agi <agi@v2.listbox.com>
Sent: Fri, August 6, 2010 2:18:09 PM
Subject: Re: [agi] Comments On My Skepticism of Solomonoff Induction


I meant:
Did Solomonoff's original idea use randomization to determine the bits of the 
programs that are used to produce the prior probabilities?  I think that the 
answer to that is obviously no.  The randomization of the next bit would used 
in 
the test of the prior probabilities as done using a random sampling.  He 
probably found that students who had some familiarity with statistics would 
initially assume that the prior probability was based on some subset of 
possible 
programs as would be expected from a typical sample, so he gave this statistics 
type of definition to emphasize the extent of what he had in mind.
 
I asked this question just to make sure that I understood what Solomonoff 
Induction was, because Abram had made some statement indicating that I really 
didn't know.  Remember, this particular branch of the discussion was originally 
centered around the question of whether Solomonoff Induction would 
be convergent, even given a way around the incomputability of finding only 
those 
programs that halted.  So while the random testing of the prior probabilities 
is 
of interest to me, I wanted to make sure that there is no evidence that 
Solomonoff Induction is convergent. I am not being petty about this, but I also 
needed to make sure that I understood what Solomonoff Induction is.
 
I am interested in hearing your ideas about your variation of 
Solomonoff Induction because your convergent series, in this context, was 
interesting.
Jim Bromer


On Fri, Aug 6, 2010 at 6:50 AM, Jim Bromer <jimbro...@gmail.com> wrote:

Jim: So, did Solomonoff's original idea involve randomizing whether the next 
bit 
would be a 1 or a 0 in the program? 

Abram: Yep. 

I meant, did Solomonoff's original idea involve randomizing whether the next 
bit 
in the program's that are originally used to produce the prior probabilities 
involve the use of randomizing whether the next bit would be a 1 or a 0?  I 
have 
not been able to find any evidence that it was.  I thought that my question was 
clear but on second thought I guess it wasn't. I think that the part about the 
coin flips was only a method to express that he was interested in the 
probability that a particular string would be produced from all possible 
programs, so that when actually testing the prior probability of a particular 
string the program that was to be run would have to be randomly generated.
Jim Bromer
 
 

 
On Wed, Aug 4, 2010 at 10:27 PM, Abram Demski <abramdem...@gmail.com> wrote:

Jim,
>
>
>Your function may be convergent but it is not a probability. 
>>
>
>True! All the possibilities sum to less than 1. There are ways of addressing 
>this (ie, multiply by a normalizing constant which must also be approximated 
>in 
>a convergent manner), but for the most part adherents of Solomonoff induction 
>don't worry about this too much. What we care about, mostly, is comparing 
>different hyotheses to decide which to favor. The normalizing constant doesn't 
>help us here, so it usually isn't mentioned. 
>
>
>
>
>You said that Solomonoff's original construction involved flipping a coin for 
>the next bit.  What good does that do?
>
>Your intuition is that running totally random programs to get predictions will 
>just produce garbage, and that is fine. The idea of Solomonoff induction, 
>though, is that it will produce systematically less garbage than just flipping 
>coins to get predictions. Most of the garbage programs will be knocked out of 
>the running by the data itself. This is supposed to be the least garbage we 
>can 
>manage without domain-specific knowledge
>
>This is backed up with the proof of dominance, which we haven't talked about 
>yet, but which is really the key argument for the optimality of Solomonoff 
>induction. 
>
>
>
>
>And how does that prove that his original idea was convergent?
>
>The proofs of equivalence between all the different formulations of Solomonoff 
>induction are something I haven't cared to look into too deeply. 
>
>
>
>
>Since his idea is incomputable, there are no algorithms that can be run to 
>demonstrate what he was talking about so the basic idea is papered with all 
>sorts of unverifiable approximations.
>
>I gave you a proof of convergence for one such approximation, and if you wish 
>I 
>can modify it to include a normalizing constant to ensure that it is a 
>probability measure. It would be helpful to me if your criticisms were more 
>specific to that proof.
>
>
>
>So, did Solomonoff's original idea involve randomizing whether the next bit 
>would be a 1 or a 0 in the program? 
>
>>
>
>Yep. 
>
>
>
>Even ignoring the halting problem what kind of result would that give?
>>

Well, the general idea is this. An even distribution intuitively represents 
lack 
of knowledge. An even distribution over possible data fails horribly, however, 
predicting white noise. We want to represent the idea that we are very ignorant 
of what the data might be, but not *that* ignorant. To capture the idea of 
regularity, ie, similarity between past and future, we instead take an even 
distribution over *descriptions* of the data. (The distribution in the 2nd 
version of solomonoff induction that I gave, the one in which the space of 
possible programs is represented as a continuum, is an even distribution.) This 
appears to provide a good amount of regularity without too much.
>
>--Abram
> 
>On Wed, Aug 4, 2010 at 8:10 PM, Jim Bromer <jimbro...@gmail.com> wrote:
>
>Abram,
>>Thanks for the explanation.  I still don't get it.  Your function may be 
>>convergent but it is not a probability.  You said that Solomonoff's original 
>>construction involved flipping a coin for the next bit.  What good does that 
>>do?  And how does that prove that his original idea was convergent?  The 
>>thing 
>>that is wrong with these explanations is that they are not coherent.  Since 
>>his 
>>idea is incomputable, there are no algorithms that can be run to demonstrate 
>>what he was talking about so the basic idea is papered with all sorts of 
>>unverifiable approximations.
>> 
>>So, did Solomonoff's original idea involve randomizing whether the next bit 
>>would be a 1 or a 0 in the program?  Even ignoring the halting problem what 
>>kind 
>>of result would that give?  Have you ever solved the problem for some strings 
>>and have you ever tested the solutions using a simulation?
>> Jim Bromer 
>>
>> 
>>On Mon, Aug 2, 2010 at 5:12 PM, Abram Demski <abramdem...@gmail.com> wrote:
>>
>>Jim,
>>>
>>>Interestingly, the formalization of Solomonoff induction I'm most familiar 
>>>with 
>>>uses a construction that relates the space of programs with the real numbers 
>>>just as you say. This formulation may be due to Solomonoff, or perhaps 
>>>Hutter... 
>>>not sure. I re-formulated it to "gloss over" that in order to make it 
>>>simpler; 
>>>I'm pretty sure the version I gave is equivalent in the relevant aspects. 
>>>However, some notes on the original construction.
>>>
>>>Programs are created by flipping coins to come up with the 1s and 0s. We are 
>>>to 
>>>think of it like this: whenever the computer reaches the end of the program 
>>>and 
>>>tries to continue on, we flip a coin to decide what the next bit of the 
>>>program 
>>>will be. We keep doing this for as long as the computer wants more bits of 
>>>instruction.
>>>
>>>This framework makes room for infinitely long programs, but makes them 
>>>infinitely improbable-- formally, they have probability 0. (We could alter 
>>>the 
>>>setup to allow them an infinitesimal probability.) Intuitively, this means 
>>>that 
>>>if we keep flipping a coin to tell the computer what to do, eventually we 
>>>will 
>>>either create an infinite loop-back (so the computer keeps executing the 
>>>already-written parts of the program and never asks for more) or write out 
>>>the 
>>>"HALT" command. Avoiding doing one or the other forever is just too 
>>>improbable.
>>>
>>>This also means all real numbers are output by some program! It just may be 
>>>one 
>>>which is infinitely long.
>>>
>>>However, all the programs that "slip past" my time bound as T increases to 
>>>infinity will have measure 0, meaning they don't add anything to the sum. 
>>>This 
>>>means the convergence is unaffected.
>>>
>>>Note: in this construction, program space is *still* a well-defined entity.
>>>
>>>--Abram
>>>
>>>
>>>On Sun, Aug 1, 2010 at 9:05 PM, Jim Bromer <jimbro...@gmail.com> wrote:
>>>
>>>Abram,
>>>>This is a very interesting function.  I have spent a lot of time thinking 
>>>>about 
>>>>it.  However, I do not believe that does, in any way, prove or indicate 
>>>>that 
>>>>Solomonoff Induction is convergent.  I want to discuss the function but I 
>>>>need 
>>>>to take more time to study some stuff and to work various details out.  
>>>>(Although I have thought a lot about it, I am writing this under a sense 
>>>>of deadline, so it may not be well composed.)
>>>> 
>>>>My argument was that Solomonoff's conjecture, which was based (as far as I 
>>>>can 
>>>>tell) on 'all possible programs', was fundamentally flawed because the idea 
>>>>of 
>>>>'all possible programs' is not a programmable definition.  All possible 
>>>>programs 
>>>>is a domain, not a class of programs that can be feasibly defined in the 
>>>>form of 
>>>>an algorithm that could 'reach' all the programs.
>>>> 
>>>>The domain of all possible programs is trans-infinite just as the domain of 
>>>>irrational numbers are.  Why do I believe this?  Because if we imagine that 
>>>>infinite algorithms are computable, then we could compute irrational 
>>>>numbers.  
>>>>That is, there are programs that, given infinite resources, could compute 
>>>>irrational numbers.  We can use the binomial theorem, for example to 
>>>>compute the 
>>>>square root of 2.  And we can use trial and error methods to compute the 
>>>>nth 
>>>>root of any number.  So that proves that there are infinite irrational 
>>>>numbers 
>>>>that can be computed by algorithms that run for infinity.
>>>> 
>>>>So what does this have to do with Solomonoff's conjecture of all possible 
>>>>programs?  Well, if I could prove that any individual irrational number 
>>>>could be 
>>>>computed (with programs that ran through infinity) then I might be able to 
>>>>prove 
>>>>that there are trans-infinite programs.  If I could prove that some 
>>>>trans-infinite subset of irrational numbers could be computed then I might 
>>>>be 
>>>>able to prove that 'all possible programs' is a trans-infinite class.
>>>> 
>>>>Now Abram said that since his sum, based on runtime and program length, is 
>>>>convergent it can prove that Solomonoff Induction is convergent.  Even 
>>>>assuming 
>>>>that his convergent sum method could be fixed up a little, I suspect that 
>>>>this 
>>>>time-length bound is misleading.  Since a Turing Machine allows for 
>>>>erasures 
>>>>this means that a program could last longer than his time parameter and 
>>>>still 
>>>>produce an output string that matches the given string.  And if 'all 
>>>>possible 
>>>>programs' is a trans-infinite class then there are programs that you are 
>>>>going 
>>>>to miss.  Your encoding method will miss trans-infinite programs (unless 
>>>>you 
>>>>have trans-cended the trans-infinite.)  
>>>>
>>>> 
>>>>However,I want to study the function and some other ideas related to this 
>>>>kind 
>>>>of thing a little more.  It is an interesting function.  Unfortunately I 
>>>>also 
>>>>have to get back to other-worldly things.
>>>>Jim Bromer
>>>>
>>>>
>>>>On Mon, Jul 26, 2010 at 2:54 PM, Abram Demski <abramdem...@gmail.com> wrote:
>>>>
>>>>Jim,
>>>>>
>>>>>I'll argue that solomonoff probabilities are in fact like Pi, that is, 
>>>>>computable in the limit.
>>>>>
>>>>>I still do not understand why you think these combinations are necessary. 
>>>>>It is 
>>>>>not necessary to make some sort of ordering of the sum to get it to 
>>>>>converge: 
>>>>>ordering only matters for infinite sums which include negative numbers. 
>>>>>(Perhaps 
>>>>>that's where you're getting the idea?)
>>>>>
>>>>>Here's my proof, rewritten from an earlier post, using the properties of 
>>>>>infinite sums of non-negative numbers.
>>>>>
>>>>>(preliminaries)
>>>>>
>>>>>Define the computation as follows: we start with a string S which we want 
>>>>>to 
>>>>>know the Solomonoff probability of. We are given a time-limit T. We start 
>>>>>with 
>>>>>P=0, where P is a real number with precision 2*log_4(T) or more. We use 
>>>>>some 
>>>>>binary encoding for programs which (unlike normal programming languages) 
>>>>>does 
>>>>>not contain syntactically invalid programs, but still will (of course) 
>>>>>contain 
>>>>>infinite loops and so on. We run program "0" and "1" for T/4 each, "00", 
>>>>>"01", 
>>>>>"10" and "11" for T/16 each, and in general run each program of length N 
>>>>>for 
>>>>>floor[T/(4^N)] until T/(4^N) is less than 1. Each time we run a program 
>>>>>and the 
>>>>>result is S, we add 1/(4^N) to P.
>>>>>
>>>>>(assertion)
>>>>>
>>>>>P converges to some value as T is increased.
>>>>>
>>>>>(proof)
>>>>>
>>>>>If every single program were to output S, then T would converge to 1/4 + 
>>>>>1/4 + 
>>>>>1/16 + 1/16 + 1/16 + 1/16 + ... that is, 2*(1/(4^1)) + 4*(1/(4^2)) + 
>>>>>8*(1/(4^3)) 
>>>>>+ ... which comes to 1/2 + 1/4 + 1/8 + 1/16 + .. i.e. 1/(2^1) + 1/(2^2) + 
>>>>>1/(2^3) + ... ; it is well-known that this sequence converges to 1. Thus 1 
>>>>>is an 
>>>>>upper bound for P: it could only get that high if every single program 
>>>>>were to 
>>>>>output S.
>>>>>
>>>>>0 is a lower bound, since we start there and never subtract anything. In 
>>>>>fact, 
>>>>>every time we add more to P, we have a new lower bound: P will never go 
>>>>>below a 
>>>>>number once it reaches it. The sum can only increase. Infinite sums with 
>>>>>this 
>>>>>property must either converge to a finite number, or go to infinity. 
>>>>>However, we 
>>>>>already know that 1 is an upper bound for P; so, it cannot go to infinity. 
>>>>>Hence, it must converge.
>>>>>
>>>>>--Abram
>>>>>
>>>>> 
>>>>>On Mon, Jul 26, 2010 at 9:14 AM, Jim Bromer <jimbro...@gmail.com> wrote:
>>>>>
>>>>>As far as I can tell right now, my theories that Solomonoff Induction is 
>>>>>trans-infinite were wrong.  Now that I realize that the mathematics do not 
>>>>>support these conjectures, I have to acknowledge that I would not be able 
>>>>>to 
>>>>>prove or even offer a sketch of a proof of my theories.  Although I did 
>>>>>not use 
>>>>>rigourous mathematics while I have tried to make an assessment of the 
>>>>>Solomonoff 
>>>>>method, the first principle of rigourous mathematics is to acknowledge 
>>>>>that the 
>>>>>mathematics does not support your supposition when they don't.
>>>>>> 
>>>>>>Solomonoff Induction isn't a mathematical theory because the desired 
>>>>>>results are 
>>>>>>not computable.  As I mentioned before, there are a great many functions 
>>>>>>that 
>>>>>>are not computable but which are useful and important because they tend 
>>>>>>toward a 
>>>>>>limit which can be seen in with a reasonable number of calculations using 
>>>>>>the 
>>>>>>methods available.  Pi is one such function.  (I am presuming that pi 
>>>>>>would require an infinite expansion which seems right.)
>>>>>> 
>>>>>>I have explained, and I think it is a correct explanation, that there is 
>>>>>>no way 
>>>>>>that you could make an apriori computation of all possible combinations 
>>>>>>taken 
>>>>>>from infinite values.  So you could not even come up with a theoretical 
>>>>>>construct that could take account of that level of complexity.  It is 
>>>>>>true that 
>>>>>>you could come up with a theoretical computational method that could take 
>>>>>>account of any finite number of values, and that is what we are talking 
>>>>>>about 
>>>>>>when we talk about the infinite, but in this context it only points to a 
>>>>>>theoretical paradox.  Your theoretical solution could not take the final 
>>>>>>step of 
>>>>>>computing a probability for a string until it had run through the 
>>>>>>infinite 
>>>>>>combinations and this is impossible.  The same problem does not occur for 
>>>>>>pi 
>>>>>>because the function that produces it tends toward a limit.
>>>>>> 
>>>>>>The reason I thought Solomonoff Induction was trans-infinite was because 
>>>>>>I 
>>>>>>thought it used the Bayesian notion to compute the probability using all 
>>>>>>possible programs that produced a particular substring following a given 
>>>>>>prefix.  Now I understand that the desired function is the computation of 
>>>>>>only 
>>>>>>the probability of a particular string (not all possible substrings that 
>>>>>>are 
>>>>>>identical to the string) following a given prefix.  I want to study the 
>>>>>>method a 
>>>>>>little further during the next few weeks, but I just wanted to make it 
>>>>>>clear 
>>>>>>that, as far as now understand the program, that I do not think that it 
>>>>>>is 
>>>>>>trans-infinite.
>>>>>> 
>>>>>>Jim Bromer

agi | Archives  | Modify Your Subscription  


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Comments On My Skepticism of Solomonoff Induction

Reply via email to