Re: [agi] Comments On My Skepticism of Solomonoff Induction

Jim Bromer Fri, 06 Aug 2010 11:18:43 -0700

I meant:
Did Solomonoff's original idea use randomization to determine the bits of
the programs that are used to produce the *prior probabilities*?  I think
that the answer to that is obviously no.  The randomization of the next bit
would used in the test of the prior probabilities as done using a random
sampling.  He probably found that students who had some familiarity with
statistics would initially assume that the prior probability was based on
some subset of possible programs as would be expected from a typical sample,
so he gave this statistics type of definition to emphasize the extent of
what he had in mind.


I asked this question just to make sure that I understood what Solomonoff
Induction was, because Abram had made some statement indicating that I
really didn't know.  Remember, this particular branch of the discussion was
originally centered around the question of whether Solomonoff
Induction would be convergent, even given a way around the incomputability
of finding only those programs that halted.  So while the random testing of
the prior probabilities is of interest to me, I wanted to make sure that
there is no evidence that Solomonoff Induction is convergent. I am not being
petty about this, but I also needed to make sure that I understood what
Solomonoff Induction is.

I am interested in hearing your ideas about your variation of
Solomonoff Induction because your convergent series, in this context, was
interesting.
Jim Bromer

On Fri, Aug 6, 2010 at 6:50 AM, Jim Bromer <[email protected]> wrote:

> Jim: So, did Solomonoff's original idea involve randomizing whether the
>> next bit would be a 1 or a 0 in the program?
>
> Abram: Yep.
> I meant, did Solomonoff's original idea involve randomizing whether the
> next bit in the program's that are originally used to produce the *prior
> probabilities* involve the use of randomizing whether the next bit would
> be a 1 or a 0?  I have not been able to find any evidence that it was.
> I thought that my question was clear but on second thought I guess it
> wasn't. I think that the part about the coin flips was only a method to
> express that he was interested in the probability that a particular string
> would be produced from all possible programs, so that when actually testing
> the prior probability of a particular string the program that was to be run
> would have to be randomly generated.
> Jim Bromer
>
>
>
>
> On Wed, Aug 4, 2010 at 10:27 PM, Abram Demski <[email protected]>wrote:
>
>> Jim,
>>
>>  Your function may be convergent but it is not a probability.
>>>
>>
>> True! All the possibilities sum to less than 1. There are ways of
>> addressing this (ie, multiply by a normalizing constant which must also be
>> approximated in a convergent manner), but for the most part adherents of
>> Solomonoff induction don't worry about this too much. What we care about,
>> mostly, is comparing different hyotheses to decide which to favor. The
>> normalizing constant doesn't help us here, so it usually isn't mentioned.
>>
>>
>> You said that Solomonoff's original construction involved flipping a coin
>>> for the next bit.  What good does that do?
>>
>>
>> Your intuition is that running totally random programs to get predictions
>> will just produce garbage, and that is fine. The idea of Solomonoff
>> induction, though, is that it will produce systematically less garbage than
>> just flipping coins to get predictions. Most of the garbage programs will be
>> knocked out of the running by the data itself. This is supposed to be the
>> least garbage we can manage without domain-specific knowledge
>>
>> This is backed up with the proof of dominance, which we haven't talked
>> about yet, but which is really the key argument for the optimality of
>> Solomonoff induction.
>>
>>
>> And how does that prove that his original idea was convergent?
>>
>>
>> The proofs of equivalence between all the different formulations of
>> Solomonoff induction are something I haven't cared to look into too deeply.
>>
>> Since his idea is incomputable, there are no algorithms that can be run to
>>> demonstrate what he was talking about so the basic idea is papered with all
>>> sorts of unverifiable approximations.
>>
>>
>> I gave you a proof of convergence for one such approximation, and if you
>> wish I can modify it to include a normalizing constant to ensure that it is
>> a probability measure. It would be helpful to me if your criticisms were
>> more specific to that proof.
>>
>> So, did Solomonoff's original idea involve randomizing whether the next
>>> bit would be a 1 or a 0 in the program?
>>>
>>
>> Yep.
>>
>> Even ignoring the halting problem what kind of result would that give?
>>>
>>
>> Well, the general idea is this. An even distribution intuitively
>> represents lack of knowledge. An even distribution over possible data fails
>> horribly, however, predicting white noise. We want to represent the idea
>> that we are very ignorant of what the data might be, but not *that*
>> ignorant. To capture the idea of regularity, ie, similarity between past and
>> future, we instead take an even distribution over *descriptions* of the
>> data. (The distribution in the 2nd version of solomonoff induction that I
>> gave, the one in which the space of possible programs is represented as a
>> continuum, is an even distribution.) This appears to provide a good amount
>> of regularity without too much.
>>
>> --Abram
>>
>> On Wed, Aug 4, 2010 at 8:10 PM, Jim Bromer <[email protected]> wrote:
>>
>>> Abram,
>>> Thanks for the explanation.  I still don't get it.  Your function may be
>>> convergent but it is not a probability.  You said that Solomonoff's original
>>> construction involved flipping a coin for the next bit.  What good does that
>>> do?  And how does that prove that his original idea was convergent?  The
>>> thing that is wrong with these explanations is that they are not coherent.
>>> Since his idea is incomputable, there are no algorithms that can be run to
>>> demonstrate what he was talking about so the basic idea is papered with all
>>> sorts of unverifiable approximations.
>>>
>>> So, did Solomonoff's original idea involve randomizing whether the next
>>> bit would be a 1 or a 0 in the program?  Even ignoring the halting
>>> problem what kind of result would that give?  Have you ever solved the
>>> problem for some strings and have you ever tested the solutions using a
>>> simulation?
>>>
>>> Jim Bromer
>>>
>>> On Mon, Aug 2, 2010 at 5:12 PM, Abram Demski <[email protected]>wrote:
>>>
>>>> Jim,
>>>>
>>>> Interestingly, the formalization of Solomonoff induction I'm most
>>>> familiar with uses a construction that relates the space of programs with
>>>> the real numbers just as you say. This formulation may be due to 
>>>> Solomonoff,
>>>> or perhaps Hutter... not sure. I re-formulated it to "gloss over" that in
>>>> order to make it simpler; I'm pretty sure the version I gave is equivalent
>>>> in the relevant aspects. However, some notes on the original construction.
>>>>
>>>> Programs are created by flipping coins to come up with the 1s and 0s. We
>>>> are to think of it like this: whenever the computer reaches the end of the
>>>> program and tries to continue on, we flip a coin to decide what the next 
>>>> bit
>>>> of the program will be. We keep doing this for as long as the computer 
>>>> wants
>>>> more bits of instruction.
>>>>
>>>> This framework makes room for infinitely long programs, but makes them
>>>> infinitely improbable-- formally, they have probability 0. (We could alter
>>>> the setup to allow them an infinitesimal probability.) Intuitively, this
>>>> means that if we keep flipping a coin to tell the computer what to do,
>>>> eventually we will either create an infinite loop-back (so the computer
>>>> keeps executing the already-written parts of the program and never asks for
>>>> more) or write out the "HALT" command. Avoiding doing one or the other
>>>> forever is just too improbable.
>>>>
>>>> This also means all real numbers are output by some program! It just may
>>>> be one which is infinitely long.
>>>>
>>>> However, all the programs that "slip past" my time bound as T increases
>>>> to infinity will have measure 0, meaning they don't add anything to the 
>>>> sum.
>>>> This means the convergence is unaffected.
>>>>
>>>> Note: in this construction, program space is *still* a well-defined
>>>> entity.
>>>>
>>>> --Abram
>>>>
>>>> On Sun, Aug 1, 2010 at 9:05 PM, Jim Bromer <[email protected]> wrote:
>>>>
>>>>> Abram,
>>>>>
>>>>> This is a very interesting function.  I have spent a lot of time
>>>>> thinking about it.  However, I do not believe that does, in any way,
>>>>> prove or indicate that Solomonoff Induction is convergent.  I want to
>>>>> discuss the function but I need to take more time to study some stuff and 
>>>>> to
>>>>> work various details out.  (Although I have thought a lot about it, I am
>>>>> writing this under a sense of deadline, so it may not be well composed.)
>>>>>
>>>>>
>>>>>
>>>>> My argument was that Solomonoff's conjecture, which was based (as far
>>>>> as I can tell) on 'all possible programs', was fundamentally flawed 
>>>>> because
>>>>> the idea of 'all possible programs' is not a programmable definition.
>>>>> All possible programs is a domain, not a class of programs that can be
>>>>> feasibly defined in the form of an algorithm that could 'reach' all
>>>>> the programs.
>>>>>
>>>>>
>>>>>
>>>>> The domain of all possible programs is trans-infinite just as the
>>>>> domain of irrational numbers are.  Why do I believe this?  Because if
>>>>> we imagine that infinite algorithms are computable, then we could compute
>>>>> irrational numbers.  That is, there are programs that, given infinite
>>>>> resources, could compute irrational numbers.  We can use the binomial
>>>>> theorem, for example to compute the square root of 2.  And we can use
>>>>> trial and error methods to compute the nth root of any number.  So
>>>>> that proves that there are infinite irrational numbers that can be 
>>>>> computed
>>>>> by algorithms that run for infinity.
>>>>>
>>>>>
>>>>>
>>>>> So what does this have to do with Solomonoff's conjecture of all
>>>>> possible programs?  Well, if I could prove that any individual
>>>>> irrational number could be computed (with programs that ran through
>>>>> infinity) then I might be able to prove that there are trans-infinite
>>>>> programs.  If I could prove that some trans-infinite subset of
>>>>> irrational numbers could be computed then I might be able to prove that 
>>>>> 'all
>>>>> possible programs' is a trans-infinite class.
>>>>>
>>>>>
>>>>> Now Abram said that since his sum, based on runtime and program length,
>>>>> is convergent it can prove that Solomonoff Induction is convergent.  Even
>>>>> assuming that his convergent sum method could be fixed up a little, I
>>>>> suspect that this time-length bound is misleading.  Since a Turing
>>>>> Machine allows for erasures this means that a program could last longer 
>>>>> than
>>>>> his time parameter and still produce an output string that matches the 
>>>>> given
>>>>> string.  And if 'all possible programs' is a trans-infinite class then
>>>>> there are programs that you are going to miss.  Your encoding method will
>>>>> miss trans-infinite programs (unless you have trans-cended the
>>>>> trans-infinite.)
>>>>>
>>>>> However, I want to study the function and some other ideas related to
>>>>> this kind of thing a little more.  It is an interesting function.
>>>>> Unfortunately I also have to get back to other-worldly things.
>>>>>
>>>>> Jim Bromer
>>>>>
>>>>>
>>>>> On Mon, Jul 26, 2010 at 2:54 PM, Abram Demski 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Jim,
>>>>>>
>>>>>> I'll argue that solomonoff probabilities are in fact like Pi, that is,
>>>>>> computable in the limit.
>>>>>>
>>>>>> I still do not understand why you think these combinations are
>>>>>> necessary. It is not necessary to make some sort of ordering of the sum 
>>>>>> to
>>>>>> get it to converge: ordering only matters for infinite sums which include
>>>>>> negative numbers. (Perhaps that's where you're getting the idea?)
>>>>>>
>>>>>> Here's my proof, rewritten from an earlier post, using the properties
>>>>>> of infinite sums of non-negative numbers.
>>>>>>
>>>>>> (preliminaries)
>>>>>>
>>>>>> Define the computation as follows: we start with a string S which we
>>>>>> want to know the Solomonoff probability of. We are given a time-limit T. 
>>>>>> We
>>>>>> start with P=0, where P is a real number with precision 2*log_4(T) or 
>>>>>> more.
>>>>>> We use some binary encoding for programs which (unlike normal programming
>>>>>> languages) does not contain syntactically invalid programs, but still 
>>>>>> will
>>>>>> (of course) contain infinite loops and so on. We run program "0" and "1" 
>>>>>> for
>>>>>> T/4 each, "00", "01", "10" and "11" for T/16 each, and in general run 
>>>>>> each
>>>>>> program of length N for floor[T/(4^N)] until T/(4^N) is less than 1. Each
>>>>>> time we run a program and the result is S, we add 1/(4^N) to P.
>>>>>>
>>>>>> (assertion)
>>>>>>
>>>>>> P converges to some value as T is increased.
>>>>>>
>>>>>> (proof)
>>>>>>
>>>>>> If every single program were to output S, then T would converge to 1/4
>>>>>> + 1/4 + 1/16 + 1/16 + 1/16 + 1/16 + ... that is, 2*(1/(4^1)) + 
>>>>>> 4*(1/(4^2)) +
>>>>>> 8*(1/(4^3)) + ... which comes to 1/2 + 1/4 + 1/8 + 1/16 + .. i.e. 
>>>>>> 1/(2^1) +
>>>>>> 1/(2^2) + 1/(2^3) + ... ; it is well-known that this sequence converges 
>>>>>> to
>>>>>> 1. Thus 1 is an upper bound for P: it could only get that high if every
>>>>>> single program were to output S.
>>>>>>
>>>>>> 0 is a lower bound, since we start there and never subtract anything.
>>>>>> In fact, every time we add more to P, we have a new lower bound: P will
>>>>>> never go below a number once it reaches it. The sum can only increase.
>>>>>> Infinite sums with this property must either converge to a finite 
>>>>>> number, or
>>>>>> go to infinity. However, we already know that 1 is an upper bound for P; 
>>>>>> so,
>>>>>> it cannot go to infinity. Hence, it must converge.
>>>>>>
>>>>>> --Abram
>>>>>>
>>>>>>   On Mon, Jul 26, 2010 at 9:14 AM, Jim Bromer <[email protected]>wrote:
>>>>>>
>>>>>>>   As far as I can tell right now, my theories that Solomonoff
>>>>>>> Induction is trans-infinite were wrong.  Now that I realize that the
>>>>>>> mathematics do not support these conjectures, I have to acknowledge 
>>>>>>> that I
>>>>>>> would not be able to prove or even offer a sketch of a proof of my
>>>>>>> theories.  Although I did not use rigourous mathematics while I have 
>>>>>>> tried
>>>>>>> to make an assessment of the Solomonoff method, the first principle of
>>>>>>> rigourous mathematics is to acknowledge that the mathematics does not
>>>>>>> support your supposition when they don't.
>>>>>>>
>>>>>>> Solomonoff Induction isn't a mathematical theory because the desired
>>>>>>> results are not computable.  As I mentioned before, there are a great 
>>>>>>> many
>>>>>>> functions that are not computable but which are useful and important 
>>>>>>> because
>>>>>>> they tend toward a limit which can be seen in with a reasonable number 
>>>>>>> of
>>>>>>> calculations using the methods available.  Pi is one such function.  (I 
>>>>>>> am
>>>>>>> presuming that pi would require an infinite expansion which seems 
>>>>>>> right.)
>>>>>>>
>>>>>>> I have explained, and I think it is a correct explanation, that there
>>>>>>> is no way that you could make an apriori computation of all possible
>>>>>>> combinations taken from infinite values.  So you could not even come up 
>>>>>>> with
>>>>>>> a theoretical construct that could take account of that level of
>>>>>>> complexity.  It is true that you could come up with a theoretical
>>>>>>> computational method that could take account of any finite number of 
>>>>>>> values,
>>>>>>> and that is what we are talking about when we talk about the infinite, 
>>>>>>> but
>>>>>>> in this context it only points to a theoretical paradox.  Your 
>>>>>>> theoretical
>>>>>>> solution could not take the final step of computing a probability for a
>>>>>>> string until it had run through the infinite combinations and this is
>>>>>>> impossible.  The same problem does not occur for pi because the function
>>>>>>> that produces it tends toward a limit.
>>>>>>>
>>>>>>> The reason I thought Solomonoff Induction was trans-infinite was
>>>>>>> because I thought it used the Bayesian notion to compute the
>>>>>>> probability using all possible programs that produced a particular 
>>>>>>> substring
>>>>>>> following a given prefix.  Now I understand that the desired function 
>>>>>>> is the
>>>>>>> computation of only the probability of a particular string (not all 
>>>>>>> possible
>>>>>>> substrings that are identical to the string) following a given prefix.  
>>>>>>> I
>>>>>>> want to study the method a little further during the next few weeks, 
>>>>>>> but I
>>>>>>> just wanted to make it clear that, as far as now understand the program,
>>>>>>> that I do not think that it is trans-infinite.
>>>>>>>
>>>>>>> Jim Bromer
>>>>>>>
>>>>>>



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Comments On My Skepticism of Solomonoff Induction

Reply via email to