Re: [agi] Comments On My Skepticism of Solomonoff Induction

Abram Demski Fri, 06 Aug 2010 16:21:43 -0700

Jim,

>From the article Matt linked to, specifically see the line:


"As [image: p] is itself a binary string, we can define the discrete
universal a priori probability, [image: m(x)], to be the probability that
the output of a universal prefix Turing machine [image: U] is [image:
x]when provided with fair coin flips on the input tape."

--Abram

On Fri, Aug 6, 2010 at 3:38 PM, Matt Mahoney <[email protected]> wrote:

> Jim, see http://www.scholarpedia.org/article/Algorithmic_probability
> <http://www.scholarpedia.org/article/Algorithmic_probability>I think this
> answers your questions.
>
>
> -- Matt Mahoney, [email protected]
>
>
> ------------------------------
> *From:* Jim Bromer <[email protected]>
> *To:* agi <[email protected]>
> *Sent:* Fri, August 6, 2010 2:18:09 PM
>
> *Subject:* Re: [agi] Comments On My Skepticism of Solomonoff Induction
>
> I meant:
> Did Solomonoff's original idea use randomization to determine the bits of
> the programs that are used to produce the *prior probabilities*?  I think
> that the answer to that is obviously no.  The randomization of the next bit
> would used in the test of the prior probabilities as done using a random
> sampling.  He probably found that students who had some familiarity with
> statistics would initially assume that the prior probability was based on
> some subset of possible programs as would be expected from a typical sample,
> so he gave this statistics type of definition to emphasize the extent of
> what he had in mind.
>
> I asked this question just to make sure that I understood what Solomonoff
> Induction was, because Abram had made some statement indicating that I
> really didn't know.  Remember, this particular branch of the discussion was
> originally centered around the question of whether Solomonoff
> Induction would be convergent, even given a way around the incomputability
> of finding only those programs that halted.  So while the random testing of
> the prior probabilities is of interest to me, I wanted to make sure that
> there is no evidence that Solomonoff Induction is convergent. I am not being
> petty about this, but I also needed to make sure that I understood what
> Solomonoff Induction is.
>
> I am interested in hearing your ideas about your variation of
> Solomonoff Induction because your convergent series, in this context, was
> interesting.
> Jim Bromer
>
> On Fri, Aug 6, 2010 at 6:50 AM, Jim Bromer <[email protected]> wrote:
>
>> Jim: So, did Solomonoff's original idea involve randomizing whether the
>>> next bit would be a 1 or a 0 in the program?
>>
>> Abram: Yep.
>> I meant, did Solomonoff's original idea involve randomizing whether the
>> next bit in the program's that are originally used to produce the *prior
>> probabilities* involve the use of randomizing whether the next bit would
>> be a 1 or a 0?  I have not been able to find any evidence that it was.
>> I thought that my question was clear but on second thought I guess it
>> wasn't. I think that the part about the coin flips was only a method to
>> express that he was interested in the probability that a particular string
>> would be produced from all possible programs, so that when actually testing
>> the prior probability of a particular string the program that was to be run
>> would have to be randomly generated.
>> Jim Bromer
>>
>>
>>
>>
>> On Wed, Aug 4, 2010 at 10:27 PM, Abram Demski <[email protected]>wrote:
>>
>>> Jim,
>>>
>>>  Your function may be convergent but it is not a probability.
>>>>
>>>
>>> True! All the possibilities sum to less than 1. There are ways of
>>> addressing this (ie, multiply by a normalizing constant which must also be
>>> approximated in a convergent manner), but for the most part adherents of
>>> Solomonoff induction don't worry about this too much. What we care about,
>>> mostly, is comparing different hyotheses to decide which to favor. The
>>> normalizing constant doesn't help us here, so it usually isn't mentioned.
>>>
>>>
>>> You said that Solomonoff's original construction involved flipping a coin
>>>> for the next bit.  What good does that do?
>>>
>>>
>>> Your intuition is that running totally random programs to get predictions
>>> will just produce garbage, and that is fine. The idea of Solomonoff
>>> induction, though, is that it will produce systematically less garbage than
>>> just flipping coins to get predictions. Most of the garbage programs will be
>>> knocked out of the running by the data itself. This is supposed to be the
>>> least garbage we can manage without domain-specific knowledge
>>>
>>> This is backed up with the proof of dominance, which we haven't talked
>>> about yet, but which is really the key argument for the optimality of
>>> Solomonoff induction.
>>>
>>>
>>> And how does that prove that his original idea was convergent?
>>>
>>>
>>> The proofs of equivalence between all the different formulations of
>>> Solomonoff induction are something I haven't cared to look into too deeply.
>>>
>>> Since his idea is incomputable, there are no algorithms that can be run
>>>> to demonstrate what he was talking about so the basic idea is papered with
>>>> all sorts of unverifiable approximations.
>>>
>>>
>>> I gave you a proof of convergence for one such approximation, and if you
>>> wish I can modify it to include a normalizing constant to ensure that it is
>>> a probability measure. It would be helpful to me if your criticisms were
>>> more specific to that proof.
>>>
>>> So, did Solomonoff's original idea involve randomizing whether the next
>>>> bit would be a 1 or a 0 in the program?
>>>>
>>>
>>> Yep.
>>>
>>> Even ignoring the halting problem what kind of result would that give?
>>>>
>>>
>>> Well, the general idea is this. An even distribution intuitively
>>> represents lack of knowledge. An even distribution over possible data fails
>>> horribly, however, predicting white noise. We want to represent the idea
>>> that we are very ignorant of what the data might be, but not *that*
>>> ignorant. To capture the idea of regularity, ie, similarity between past and
>>> future, we instead take an even distribution over *descriptions* of the
>>> data. (The distribution in the 2nd version of solomonoff induction that I
>>> gave, the one in which the space of possible programs is represented as a
>>> continuum, is an even distribution.) This appears to provide a good amount
>>> of regularity without too much.
>>>
>>> --Abram
>>>
>>> On Wed, Aug 4, 2010 at 8:10 PM, Jim Bromer <[email protected]> wrote:
>>>
>>>> Abram,
>>>> Thanks for the explanation.  I still don't get it.  Your function may be
>>>> convergent but it is not a probability.  You said that Solomonoff's 
>>>> original
>>>> construction involved flipping a coin for the next bit.  What good does 
>>>> that
>>>> do?  And how does that prove that his original idea was convergent?  The
>>>> thing that is wrong with these explanations is that they are not coherent.
>>>> Since his idea is incomputable, there are no algorithms that can be run to
>>>> demonstrate what he was talking about so the basic idea is papered with all
>>>> sorts of unverifiable approximations.
>>>>
>>>> So, did Solomonoff's original idea involve randomizing whether the next
>>>> bit would be a 1 or a 0 in the program?  Even ignoring the halting
>>>> problem what kind of result would that give?  Have you ever solved the
>>>> problem for some strings and have you ever tested the solutions using a
>>>> simulation?
>>>>
>>>> Jim Bromer
>>>>
>>>> On Mon, Aug 2, 2010 at 5:12 PM, Abram Demski <[email protected]>wrote:
>>>>
>>>>> Jim,
>>>>>
>>>>> Interestingly, the formalization of Solomonoff induction I'm most
>>>>> familiar with uses a construction that relates the space of programs with
>>>>> the real numbers just as you say. This formulation may be due to 
>>>>> Solomonoff,
>>>>> or perhaps Hutter... not sure. I re-formulated it to "gloss over" that in
>>>>> order to make it simpler; I'm pretty sure the version I gave is equivalent
>>>>> in the relevant aspects. However, some notes on the original construction.
>>>>>
>>>>> Programs are created by flipping coins to come up with the 1s and 0s.
>>>>> We are to think of it like this: whenever the computer reaches the end of
>>>>> the program and tries to continue on, we flip a coin to decide what the 
>>>>> next
>>>>> bit of the program will be. We keep doing this for as long as the computer
>>>>> wants more bits of instruction.
>>>>>
>>>>> This framework makes room for infinitely long programs, but makes them
>>>>> infinitely improbable-- formally, they have probability 0. (We could alter
>>>>> the setup to allow them an infinitesimal probability.) Intuitively, this
>>>>> means that if we keep flipping a coin to tell the computer what to do,
>>>>> eventually we will either create an infinite loop-back (so the computer
>>>>> keeps executing the already-written parts of the program and never asks 
>>>>> for
>>>>> more) or write out the "HALT" command. Avoiding doing one or the other
>>>>> forever is just too improbable.
>>>>>
>>>>> This also means all real numbers are output by some program! It just
>>>>> may be one which is infinitely long.
>>>>>
>>>>> However, all the programs that "slip past" my time bound as T increases
>>>>> to infinity will have measure 0, meaning they don't add anything to the 
>>>>> sum.
>>>>> This means the convergence is unaffected.
>>>>>
>>>>> Note: in this construction, program space is *still* a well-defined
>>>>> entity.
>>>>>
>>>>> --Abram
>>>>>
>>>>> On Sun, Aug 1, 2010 at 9:05 PM, Jim Bromer <[email protected]>wrote:
>>>>>
>>>>>> Abram,
>>>>>>
>>>>>> This is a very interesting function.  I have spent a lot of time
>>>>>> thinking about it.  However, I do not believe that does, in any way,
>>>>>> prove or indicate that Solomonoff Induction is convergent.  I want to
>>>>>> discuss the function but I need to take more time to study some stuff 
>>>>>> and to
>>>>>> work various details out.  (Although I have thought a lot about it, I am
>>>>>> writing this under a sense of deadline, so it may not be well composed.)
>>>>>>
>>>>>>
>>>>>>
>>>>>> My argument was that Solomonoff's conjecture, which was based (as far
>>>>>> as I can tell) on 'all possible programs', was fundamentally flawed 
>>>>>> because
>>>>>> the idea of 'all possible programs' is not a programmable definition.
>>>>>> All possible programs is a domain, not a class of programs that can be
>>>>>> feasibly defined in the form of an algorithm that could 'reach' all
>>>>>> the programs.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The domain of all possible programs is trans-infinite just as the
>>>>>> domain of irrational numbers are.  Why do I believe this?  Because if
>>>>>> we imagine that infinite algorithms are computable, then we could compute
>>>>>> irrational numbers.  That is, there are programs that, given infinite
>>>>>> resources, could compute irrational numbers.  We can use the binomial
>>>>>> theorem, for example to compute the square root of 2.  And we can use
>>>>>> trial and error methods to compute the nth root of any number.  So
>>>>>> that proves that there are infinite irrational numbers that can be 
>>>>>> computed
>>>>>> by algorithms that run for infinity.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So what does this have to do with Solomonoff's conjecture of all
>>>>>> possible programs?  Well, if I could prove that any individual
>>>>>> irrational number could be computed (with programs that ran through
>>>>>> infinity) then I might be able to prove that there are trans-infinite
>>>>>> programs.  If I could prove that some trans-infinite subset of
>>>>>> irrational numbers could be computed then I might be able to prove that 
>>>>>> 'all
>>>>>> possible programs' is a trans-infinite class.
>>>>>>
>>>>>>
>>>>>> Now Abram said that since his sum, based on runtime and program
>>>>>> length, is convergent it can prove that Solomonoff Induction is 
>>>>>> convergent.
>>>>>> Even assuming that his convergent sum method could be fixed up a
>>>>>> little, I suspect that this time-length bound is misleading.  Since a
>>>>>> Turing Machine allows for erasures this means that a program could last
>>>>>> longer than his time parameter and still produce an output string that
>>>>>> matches the given string.  And if 'all possible programs' is a
>>>>>> trans-infinite class then there are programs that you are going to miss.
>>>>>> Your encoding method will miss trans-infinite programs (unless you have
>>>>>> trans-cended the trans-infinite.)
>>>>>>
>>>>>> However, I want to study the function and some other ideas related to
>>>>>> this kind of thing a little more.  It is an interesting function.
>>>>>> Unfortunately I also have to get back to other-worldly things.
>>>>>>
>>>>>> Jim Bromer
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 26, 2010 at 2:54 PM, Abram Demski 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Jim,
>>>>>>>
>>>>>>> I'll argue that solomonoff probabilities are in fact like Pi, that
>>>>>>> is, computable in the limit.
>>>>>>>
>>>>>>> I still do not understand why you think these combinations are
>>>>>>> necessary. It is not necessary to make some sort of ordering of the sum 
>>>>>>> to
>>>>>>> get it to converge: ordering only matters for infinite sums which 
>>>>>>> include
>>>>>>> negative numbers. (Perhaps that's where you're getting the idea?)
>>>>>>>
>>>>>>> Here's my proof, rewritten from an earlier post, using the properties
>>>>>>> of infinite sums of non-negative numbers.
>>>>>>>
>>>>>>> (preliminaries)
>>>>>>>
>>>>>>> Define the computation as follows: we start with a string S which we
>>>>>>> want to know the Solomonoff probability of. We are given a time-limit 
>>>>>>> T. We
>>>>>>> start with P=0, where P is a real number with precision 2*log_4(T) or 
>>>>>>> more.
>>>>>>> We use some binary encoding for programs which (unlike normal 
>>>>>>> programming
>>>>>>> languages) does not contain syntactically invalid programs, but still 
>>>>>>> will
>>>>>>> (of course) contain infinite loops and so on. We run program "0" and 
>>>>>>> "1" for
>>>>>>> T/4 each, "00", "01", "10" and "11" for T/16 each, and in general run 
>>>>>>> each
>>>>>>> program of length N for floor[T/(4^N)] until T/(4^N) is less than 1. 
>>>>>>> Each
>>>>>>> time we run a program and the result is S, we add 1/(4^N) to P.
>>>>>>>
>>>>>>> (assertion)
>>>>>>>
>>>>>>> P converges to some value as T is increased.
>>>>>>>
>>>>>>> (proof)
>>>>>>>
>>>>>>> If every single program were to output S, then T would converge to
>>>>>>> 1/4 + 1/4 + 1/16 + 1/16 + 1/16 + 1/16 + ... that is, 2*(1/(4^1)) +
>>>>>>> 4*(1/(4^2)) + 8*(1/(4^3)) + ... which comes to 1/2 + 1/4 + 1/8 + 1/16 + 
>>>>>>> ..
>>>>>>> i.e. 1/(2^1) + 1/(2^2) + 1/(2^3) + ... ; it is well-known that this 
>>>>>>> sequence
>>>>>>> converges to 1. Thus 1 is an upper bound for P: it could only get that 
>>>>>>> high
>>>>>>> if every single program were to output S.
>>>>>>>
>>>>>>> 0 is a lower bound, since we start there and never subtract anything.
>>>>>>> In fact, every time we add more to P, we have a new lower bound: P will
>>>>>>> never go below a number once it reaches it. The sum can only increase.
>>>>>>> Infinite sums with this property must either converge to a finite 
>>>>>>> number, or
>>>>>>> go to infinity. However, we already know that 1 is an upper bound for 
>>>>>>> P; so,
>>>>>>> it cannot go to infinity. Hence, it must converge.
>>>>>>>
>>>>>>> --Abram
>>>>>>>
>>>>>>>   On Mon, Jul 26, 2010 at 9:14 AM, Jim Bromer 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>>   As far as I can tell right now, my theories that Solomonoff
>>>>>>>> Induction is trans-infinite were wrong.  Now that I realize that the
>>>>>>>> mathematics do not support these conjectures, I have to acknowledge 
>>>>>>>> that I
>>>>>>>> would not be able to prove or even offer a sketch of a proof of my
>>>>>>>> theories.  Although I did not use rigourous mathematics while I have 
>>>>>>>> tried
>>>>>>>> to make an assessment of the Solomonoff method, the first principle of
>>>>>>>> rigourous mathematics is to acknowledge that the mathematics does not
>>>>>>>> support your supposition when they don't.
>>>>>>>>
>>>>>>>> Solomonoff Induction isn't a mathematical theory because the desired
>>>>>>>> results are not computable.  As I mentioned before, there are a great 
>>>>>>>> many
>>>>>>>> functions that are not computable but which are useful and important 
>>>>>>>> because
>>>>>>>> they tend toward a limit which can be seen in with a reasonable number 
>>>>>>>> of
>>>>>>>> calculations using the methods available.  Pi is one such function.  
>>>>>>>> (I am
>>>>>>>> presuming that pi would require an infinite expansion which seems 
>>>>>>>> right.)
>>>>>>>>
>>>>>>>> I have explained, and I think it is a correct explanation, that
>>>>>>>> there is no way that you could make an apriori computation of all 
>>>>>>>> possible
>>>>>>>> combinations taken from infinite values.  So you could not even come 
>>>>>>>> up with
>>>>>>>> a theoretical construct that could take account of that level of
>>>>>>>> complexity.  It is true that you could come up with a theoretical
>>>>>>>> computational method that could take account of any finite number of 
>>>>>>>> values,
>>>>>>>> and that is what we are talking about when we talk about the infinite, 
>>>>>>>> but
>>>>>>>> in this context it only points to a theoretical paradox.  Your 
>>>>>>>> theoretical
>>>>>>>> solution could not take the final step of computing a probability for a
>>>>>>>> string until it had run through the infinite combinations and this is
>>>>>>>> impossible.  The same problem does not occur for pi because the 
>>>>>>>> function
>>>>>>>> that produces it tends toward a limit.
>>>>>>>>
>>>>>>>> The reason I thought Solomonoff Induction was trans-infinite was
>>>>>>>> because I thought it used the Bayesian notion to compute the
>>>>>>>> probability using all possible programs that produced a particular 
>>>>>>>> substring
>>>>>>>> following a given prefix.  Now I understand that the desired function 
>>>>>>>> is the
>>>>>>>> computation of only the probability of a particular string (not all 
>>>>>>>> possible
>>>>>>>> substrings that are identical to the string) following a given prefix. 
>>>>>>>>  I
>>>>>>>> want to study the method a little further during the next few weeks, 
>>>>>>>> but I
>>>>>>>> just wanted to make it clear that, as far as now understand the 
>>>>>>>> program,
>>>>>>>> that I do not think that it is trans-infinite.
>>>>>>>>
>>>>>>>> Jim Bromer
>>>>>>>>
>>>>>>>
>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/> | 
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/> | 
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-- 
Abram Demski
http://lo-tho.blogspot.com/
http://groups.google.com/group/one-logic



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Comments On My Skepticism of Solomonoff Induction

Reply via email to