Re: [agi] Comments On My Skepticism of Solomonoff Induction

Abram Demski Wed, 04 Aug 2010 19:27:59 -0700

Jim,

Your function may be convergent but it is not a probability.
>


True! All the possibilities sum to less than 1. There are ways of addressing
this (ie, multiply by a normalizing constant which must also be approximated
in a convergent manner), but for the most part adherents of Solomonoff
induction don't worry about this too much. What we care about, mostly, is
comparing different hyotheses to decide which to favor. The normalizing
constant doesn't help us here, so it usually isn't mentioned.

You said that Solomonoff's original construction involved flipping a coin
> for the next bit.  What good does that do?


Your intuition is that running totally random programs to get predictions
will just produce garbage, and that is fine. The idea of Solomonoff
induction, though, is that it will produce systematically less garbage than
just flipping coins to get predictions. Most of the garbage programs will be
knocked out of the running by the data itself. This is supposed to be the
least garbage we can manage without domain-specific knowledge

This is backed up with the proof of dominance, which we haven't talked about
yet, but which is really the key argument for the optimality of Solomonoff
induction.

And how does that prove that his original idea was convergent?


The proofs of equivalence between all the different formulations of
Solomonoff induction are something I haven't cared to look into too deeply.

Since his idea is incomputable, there are no algorithms that can be run to
> demonstrate what he was talking about so the basic idea is papered with all
> sorts of unverifiable approximations.


I gave you a proof of convergence for one such approximation, and if you
wish I can modify it to include a normalizing constant to ensure that it is
a probability measure. It would be helpful to me if your criticisms were
more specific to that proof.

So, did Solomonoff's original idea involve randomizing whether the next bit
> would be a 1 or a 0 in the program?
>

Yep.

Even ignoring the halting problem what kind of result would that give?
>

Well, the general idea is this. An even distribution intuitively represents
lack of knowledge. An even distribution over possible data fails horribly,
however, predicting white noise. We want to represent the idea that we are
very ignorant of what the data might be, but not *that* ignorant. To capture
the idea of regularity, ie, similarity between past and future, we instead
take an even distribution over *descriptions* of the data. (The distribution
in the 2nd version of solomonoff induction that I gave, the one in which the
space of possible programs is represented as a continuum, is an even
distribution.) This appears to provide a good amount of regularity without
too much.

--Abram

On Wed, Aug 4, 2010 at 8:10 PM, Jim Bromer <[email protected]> wrote:

> Abram,
> Thanks for the explanation.  I still don't get it.  Your function may be
> convergent but it is not a probability.  You said that Solomonoff's original
> construction involved flipping a coin for the next bit.  What good does that
> do?  And how does that prove that his original idea was convergent?  The
> thing that is wrong with these explanations is that they are not coherent.
> Since his idea is incomputable, there are no algorithms that can be run to
> demonstrate what he was talking about so the basic idea is papered with all
> sorts of unverifiable approximations.
>
> So, did Solomonoff's original idea involve randomizing whether the next bit
> would be a 1 or a 0 in the program?  Even ignoring the halting problem what
> kind of result would that give?  Have you ever solved the problem for
> some strings and have you ever tested the solutions using a simulation?
>
> Jim Bromer
>
> On Mon, Aug 2, 2010 at 5:12 PM, Abram Demski <[email protected]>wrote:
>
>> Jim,
>>
>> Interestingly, the formalization of Solomonoff induction I'm most familiar
>> with uses a construction that relates the space of programs with the real
>> numbers just as you say. This formulation may be due to Solomonoff, or
>> perhaps Hutter... not sure. I re-formulated it to "gloss over" that in order
>> to make it simpler; I'm pretty sure the version I gave is equivalent in the
>> relevant aspects. However, some notes on the original construction.
>>
>> Programs are created by flipping coins to come up with the 1s and 0s. We
>> are to think of it like this: whenever the computer reaches the end of the
>> program and tries to continue on, we flip a coin to decide what the next bit
>> of the program will be. We keep doing this for as long as the computer wants
>> more bits of instruction.
>>
>> This framework makes room for infinitely long programs, but makes them
>> infinitely improbable-- formally, they have probability 0. (We could alter
>> the setup to allow them an infinitesimal probability.) Intuitively, this
>> means that if we keep flipping a coin to tell the computer what to do,
>> eventually we will either create an infinite loop-back (so the computer
>> keeps executing the already-written parts of the program and never asks for
>> more) or write out the "HALT" command. Avoiding doing one or the other
>> forever is just too improbable.
>>
>> This also means all real numbers are output by some program! It just may
>> be one which is infinitely long.
>>
>> However, all the programs that "slip past" my time bound as T increases to
>> infinity will have measure 0, meaning they don't add anything to the sum.
>> This means the convergence is unaffected.
>>
>> Note: in this construction, program space is *still* a well-defined
>> entity.
>>
>> --Abram
>>
>> On Sun, Aug 1, 2010 at 9:05 PM, Jim Bromer <[email protected]> wrote:
>>
>>> Abram,
>>>
>>> This is a very interesting function.  I have spent a lot of time
>>> thinking about it.  However, I do not believe that does, in any way,
>>> prove or indicate that Solomonoff Induction is convergent.  I want to
>>> discuss the function but I need to take more time to study some stuff and to
>>> work various details out.  (Although I have thought a lot about it, I am
>>> writing this under a sense of deadline, so it may not be well composed.)
>>>
>>>
>>>
>>> My argument was that Solomonoff's conjecture, which was based (as far as
>>> I can tell) on 'all possible programs', was fundamentally flawed because the
>>> idea of 'all possible programs' is not a programmable definition.  All
>>> possible programs is a domain, not a class of programs that can be feasibly
>>> defined in the form of an algorithm that could 'reach' all the programs.
>>>
>>>
>>>
>>> The domain of all possible programs is trans-infinite just as the domain
>>> of irrational numbers are.  Why do I believe this?  Because if we
>>> imagine that infinite algorithms are computable, then we could compute
>>> irrational numbers.  That is, there are programs that, given infinite
>>> resources, could compute irrational numbers.  We can use the binomial
>>> theorem, for example to compute the square root of 2.  And we can use
>>> trial and error methods to compute the nth root of any number.  So that
>>> proves that there are infinite irrational numbers that can be computed by
>>> algorithms that run for infinity.
>>>
>>>
>>>
>>> So what does this have to do with Solomonoff's conjecture of all possible
>>> programs?  Well, if I could prove that any individual irrational number
>>> could be computed (with programs that ran through infinity) then I might be
>>> able to prove that there are trans-infinite programs.  If I could prove
>>> that some trans-infinite subset of irrational numbers could be computed then
>>> I might be able to prove that 'all possible programs' is a trans-infinite
>>> class.
>>>
>>>
>>> Now Abram said that since his sum, based on runtime and program length,
>>> is convergent it can prove that Solomonoff Induction is convergent.  Even
>>> assuming that his convergent sum method could be fixed up a little, I
>>> suspect that this time-length bound is misleading.  Since a Turing
>>> Machine allows for erasures this means that a program could last longer than
>>> his time parameter and still produce an output string that matches the given
>>> string.  And if 'all possible programs' is a trans-infinite class then
>>> there are programs that you are going to miss.  Your encoding method will
>>> miss trans-infinite programs (unless you have trans-cended the
>>> trans-infinite.)
>>>
>>> However, I want to study the function and some other ideas related to
>>> this kind of thing a little more.  It is an interesting function.
>>> Unfortunately I also have to get back to other-worldly things.
>>>
>>> Jim Bromer
>>>
>>>
>>> On Mon, Jul 26, 2010 at 2:54 PM, Abram Demski <[email protected]>wrote:
>>>
>>>> Jim,
>>>>
>>>> I'll argue that solomonoff probabilities are in fact like Pi, that is,
>>>> computable in the limit.
>>>>
>>>> I still do not understand why you think these combinations are
>>>> necessary. It is not necessary to make some sort of ordering of the sum to
>>>> get it to converge: ordering only matters for infinite sums which include
>>>> negative numbers. (Perhaps that's where you're getting the idea?)
>>>>
>>>> Here's my proof, rewritten from an earlier post, using the properties of
>>>> infinite sums of non-negative numbers.
>>>>
>>>> (preliminaries)
>>>>
>>>> Define the computation as follows: we start with a string S which we
>>>> want to know the Solomonoff probability of. We are given a time-limit T. We
>>>> start with P=0, where P is a real number with precision 2*log_4(T) or more.
>>>> We use some binary encoding for programs which (unlike normal programming
>>>> languages) does not contain syntactically invalid programs, but still will
>>>> (of course) contain infinite loops and so on. We run program "0" and "1" 
>>>> for
>>>> T/4 each, "00", "01", "10" and "11" for T/16 each, and in general run each
>>>> program of length N for floor[T/(4^N)] until T/(4^N) is less than 1. Each
>>>> time we run a program and the result is S, we add 1/(4^N) to P.
>>>>
>>>> (assertion)
>>>>
>>>> P converges to some value as T is increased.
>>>>
>>>> (proof)
>>>>
>>>> If every single program were to output S, then T would converge to 1/4 +
>>>> 1/4 + 1/16 + 1/16 + 1/16 + 1/16 + ... that is, 2*(1/(4^1)) + 4*(1/(4^2)) +
>>>> 8*(1/(4^3)) + ... which comes to 1/2 + 1/4 + 1/8 + 1/16 + .. i.e. 1/(2^1) +
>>>> 1/(2^2) + 1/(2^3) + ... ; it is well-known that this sequence converges to
>>>> 1. Thus 1 is an upper bound for P: it could only get that high if every
>>>> single program were to output S.
>>>>
>>>> 0 is a lower bound, since we start there and never subtract anything. In
>>>> fact, every time we add more to P, we have a new lower bound: P will never
>>>> go below a number once it reaches it. The sum can only increase. Infinite
>>>> sums with this property must either converge to a finite number, or go to
>>>> infinity. However, we already know that 1 is an upper bound for P; so, it
>>>> cannot go to infinity. Hence, it must converge.
>>>>
>>>> --Abram
>>>>
>>>>   On Mon, Jul 26, 2010 at 9:14 AM, Jim Bromer <[email protected]>wrote:
>>>>
>>>>>   As far as I can tell right now, my theories that Solomonoff
>>>>> Induction is trans-infinite were wrong.  Now that I realize that the
>>>>> mathematics do not support these conjectures, I have to acknowledge that I
>>>>> would not be able to prove or even offer a sketch of a proof of my
>>>>> theories.  Although I did not use rigourous mathematics while I have tried
>>>>> to make an assessment of the Solomonoff method, the first principle of
>>>>> rigourous mathematics is to acknowledge that the mathematics does not
>>>>> support your supposition when they don't.
>>>>>
>>>>> Solomonoff Induction isn't a mathematical theory because the desired
>>>>> results are not computable.  As I mentioned before, there are a great many
>>>>> functions that are not computable but which are useful and important 
>>>>> because
>>>>> they tend toward a limit which can be seen in with a reasonable number of
>>>>> calculations using the methods available.  Pi is one such function.  (I am
>>>>> presuming that pi would require an infinite expansion which seems right.)
>>>>>
>>>>> I have explained, and I think it is a correct explanation, that there
>>>>> is no way that you could make an apriori computation of all possible
>>>>> combinations taken from infinite values.  So you could not even come up 
>>>>> with
>>>>> a theoretical construct that could take account of that level of
>>>>> complexity.  It is true that you could come up with a theoretical
>>>>> computational method that could take account of any finite number of 
>>>>> values,
>>>>> and that is what we are talking about when we talk about the infinite, but
>>>>> in this context it only points to a theoretical paradox.  Your theoretical
>>>>> solution could not take the final step of computing a probability for a
>>>>> string until it had run through the infinite combinations and this is
>>>>> impossible.  The same problem does not occur for pi because the function
>>>>> that produces it tends toward a limit.
>>>>>
>>>>> The reason I thought Solomonoff Induction was trans-infinite was
>>>>> because I thought it used the Bayesian notion to compute the
>>>>> probability using all possible programs that produced a particular 
>>>>> substring
>>>>> following a given prefix.  Now I understand that the desired function is 
>>>>> the
>>>>> computation of only the probability of a particular string (not all 
>>>>> possible
>>>>> substrings that are identical to the string) following a given prefix.  I
>>>>> want to study the method a little further during the next few weeks, but I
>>>>> just wanted to make it clear that, as far as now understand the program,
>>>>> that I do not think that it is trans-infinite.
>>>>>
>>>>> Jim Bromer
>>>>>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>>>>> <https://www.listbox.com/member/archive/rss/303/> | 
>>>>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>>>>> <http://www.listbox.com/>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Abram Demski
>>>> http://lo-tho.blogspot.com/
>>>> http://groups.google.com/group/one-logic
>>>>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>>>> <https://www.listbox.com/member/archive/rss/303/> | 
>>>> Modify<https://www.listbox.com/member/?&;>Your Subscription 
>>>> <http://www.listbox.com/>
>>>>
>>>
>>>   *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>>> <https://www.listbox.com/member/archive/rss/303/> | 
>>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>>> <http://www.listbox.com/>
>>>
>>
>>
>>
>> --
>> Abram Demski
>> http://lo-tho.blogspot.com/
>> http://groups.google.com/group/one-logic
>>   *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>> <https://www.listbox.com/member/archive/rss/303/> | 
>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>> <http://www.listbox.com/>
>>
>
>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/> | 
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-- 
Abram Demski
http://lo-tho.blogspot.com/
http://groups.google.com/group/one-logic



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Comments On My Skepticism of Solomonoff Induction

Reply via email to