Re: [agi] Re: Huge Progress on the Core of AGI

David Jones Tue, 29 Jun 2010 12:49:21 -0700

Such an example is no where near sufficient to accept the assertion that
program size is the right way to define simplicity of a hypothesis.


Here is a counter example. It requires a slightly more complex example
because all zeros doesn't leave any room for alternative hypotheses.

Here is the sequence: 10, 21, 32

void hypothesis_1() {
   int ten = 10;
   int counter = 0;
    while (1)
    {
       print(ten+counter)
       ten = ten + 10;
       counter = counter + 1;
    }
  }

void hypothesis_2() {
    while (1)
       print("10 21 32")
   }


Hypothesis 2 is simpler, yet clearly wrong. These examples don't really show
anything.

Dave

On Tue, Jun 29, 2010 at 3:15 PM, Matt Mahoney <matmaho...@yahoo.com> wrote:

> David Jones wrote:
> > I really don't think this is the right way to calculate simplicity.
>
> I will give you an example, because examples are more convincing than
> proofs.
>
> Suppose you perform a sequence of experiments whose outcome can either be 0
> or 1. In the first 10 trials you observe 0000000000. What do you expect to
> observe in the next trial?
>
> Hypothesis 1: the outcome is always 0.
> Hypothesis 2: the outcome is 0 for the first 10 trials and 1 thereafter.
>
> Hypothesis 1 is shorter than 2, so it is more likely to be correct.
>
> If I describe the two hypotheses in French or Chinese, then 1 is still
> shorter than 2.
>
> If I describe the two hypotheses in C, then 1 is shorter than 2.
>
>   void hypothesis_1() {
>     while (1) printf("0");
>   }
>
>   void hypothesis_2() {
>     int i;
>     for (i=0; i<10; ++i) printf("0");
>     while (1) printf("1");
>   }
>
> If I translate these programs into Perl or Lisp or x86 assembler, then 1
> will still be shorter than 2.
>
> I realize there might be smaller equivalent programs. But I think you could
> find a smaller program equivalent to hypothesis_1 than hypothesis_2.
>
> I realize there are other hypotheses than 1 or 2. But I think that the
> smallest one you can find that outputs eleven bits of which the first ten
> are zeros will be a program that outputs another zero.
>
> I realize that you could rewrite 1 so that it is longer than 2. But it is
> the shortest version that counts. More specifically consider all programs in
> which the first 10 outputs are 0. Then weight each program by 2^-length. So
> the shortest programs dominate.
>
> I realize you could make up a language where the shortest encoding of
> hypothesis 2 is shorter than 1. You could do this for any pair of
> hypotheses. However, I think if you stick to "simple" languages (and I
> realize this is a circular definition), then 1 will usually be shorter than
> 2.
>
>
> -- Matt Mahoney, matmaho...@yahoo.com
>
>
> ------------------------------
> *From:* David Jones <davidher...@gmail.com>
> *To:* agi <agi@v2.listbox.com>
> *Sent:* Tue, June 29, 2010 1:31:01 PM
>
> *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI
>
>
>
> On Tue, Jun 29, 2010 at 11:26 AM, Matt Mahoney <matmaho...@yahoo.com>wrote:
>
>> > Right. But Occam's Razor is not complete. It says simpler is better, but
>> 1) this only applies when two hypotheses have the same explanatory power and
>> 2) what defines simpler?
>>
>> A hypothesis is a program that outputs the observed data. It "explains"
>> the data if its output matches what is observed. The "simpler" hypothesis is
>> the shorter program, measured in bits.
>>
>
> I can't be confident that bits is the right way to do it. I suspect bits is
> an approximation of a more accurate method. I also suspect that you can
> write a more complex explanation "program" with the same number of bits. So,
> there are some flaws with this approach. It is an interesting idea to
> consider though.
>
>
>>
>> The language used to describe the data can be any Turing complete
>> programming language (C, Lisp, etc) or any natural language such as
>> English. It does not matter much which language you use, because for any two
>> languages there is a fixed length procedure, described in either of the
>> languages, independent of the data, that translates descriptions in one
>> language to the other.
>>
>
> Hypotheses don't have to be written in actual computer code and probably
> shouldn't be because hypotheses are not really meant to be "run" per say.
> And outputs are not necessarily the right way to put it either. Outputs
> imply prediction. And as mike has often pointed out, things cannot be
> precisely predicted. We can, however, determine whether a particular
> observation fits expectations, rather than equals some prediction. There may
> be multiple possible outcomes that we expect and which would be consistent
> with a hypothesis, which is why actual prediction should not be used.
>
> > For example, the simplest hypothesis for all visual interpretation is
>> that everything in the first image is gone in the second image, and
>> everything in the second image is a new object. Simple. Done. Solved :)
>> right?
>>
>> The hypothesis is not the simplest. The program that outputs the two
>> frames as if independent cannot be smaller than the two frames compressed
>> independently. The program could be made smaller if it only described how
>> the second frame is different than the first. It would be more likely to
>> correctly predict the third frame if it continued to run and described how
>> it would be different than the second frame.
>>
>
> I really don't think this is the right way to calculate simplicity.
>
>
>>
>> > I don't think much progress has been made in this area, but I'd like to
>> know what other people have done and any successes they've had.
>>
>> Kolmogorov proved that the solution is not computable. Given a hypothesis
>> (a description of the observed data, or a program that outputs the observed
>> data), there is no general procedure or test to determine whether a shorter
>> (simpler, better) hypothesis exists. Proof: suppose there were. Then I could
>> describe "the first data set that cannot be described in less than a million
>> bits" even though I just did. (By "first" I mean the first data set encoded
>> by a string from shortest to longest, breaking ties lexicographically).
>>
>
> You are making a different point than the question I asked. It may not be
> computable to determine whether a better hypothesis exists, but that is not
> what I want to know. What I want to know is what is the better of any two
> hypotheses I can come up with. Creating the hypotheses requires a different
> algorithm, which is not as hard to me as the other parts of the problem.
>
>
>>
>> That said, I believe the state of the art in both language and vision are
>> based on hierarchical neural models, i.e. pattern recognition using learned
>> weighted combinations of simpler patterns. I am more familiar with language.
>> The top ranked programs can be found at
>> http://mattmahoney.net/dc/text.html
>>
>
> state of the art in explanatory reasoning is what I'm looking for.
>
>
>>
>>
>> -- Matt Mahoney, matmaho...@yahoo.com
>>
>>
>> ------------------------------
>> *From:* David Jones <davidher...@gmail.com>
>> *To:* agi <agi@v2.listbox.com>
>> *Sent:* Tue, June 29, 2010 10:44:41 AM
>> *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI
>>
>> Thanks Matt,
>>
>> Right. But Occam's Razor is not complete. It says simpler is better, but
>> 1) this only applies when two hypotheses have the same explanatory power and
>> 2) what defines simpler?
>>
>> So, maybe what I want to know from the state of the art in research is:
>>
>> 1) how precisely do other people define "simpler"
>> and
>> 2) More importantly, how do you compare competing explanations/hypotheses
>> that have more or less explanatory power. Simpler does not apply unless you
>> are comparing equally explanatory hypotheses.
>>
>> For example, the simplest hypothesis for all visual interpretation is that
>> everything in the first image is gone in the second image, and everything in
>> the second image is a new object. Simple. Done. Solved :) right? Well,
>> clearly a more complicated explanation is warranted because a more
>> complicated explanation is more *explanatory* and a better explanation. So,
>> why is it better? Can it be defined as better in a precise way so that you
>> can compare arbitrary hypotheses or explanations? That is what I'm trying to
>> learn about. I don't think much progress has been made in this area, but I'd
>> like to know what other people have done and any successes they've had.
>>
>> Dave
>>
>>
>> On Tue, Jun 29, 2010 at 10:29 AM, Matt Mahoney <matmaho...@yahoo.com>wrote:
>>
>>> David Jones wrote:
>>> > If anyone has any knowledge of or references to the state of the art in
>>> explanation-based reasoning, can you send me keywords or links?
>>>
>>> The simplest explanation of the past is the best predictor of the future.
>>> http://en.wikipedia.org/wiki/Occam's_razor<http://en.wikipedia.org/wiki/Occam%27s_razor>
>>>  <http://en.wikipedia.org/wiki/Occam%27s_razor>
>>> http://www.scholarpedia.org/article/Algorithmic_probability
>>>  <http://www.scholarpedia.org/article/Algorithmic_probability>
>>>
>>> -- Matt Mahoney, matmaho...@yahoo.com
>>>
>>>
>>> ------------------------------
>>> *From:* David Jones <davidher...@gmail.com>
>>>
>>> *To:* agi <agi@v2.listbox.com>
>>> *Sent:* Tue, June 29, 2010 9:05:45 AM
>>> *Subject:* [agi] Re: Huge Progress on the Core of AGI
>>>
>>> If anyone has any knowledge of or references to the state of the art in
>>> explanation-based reasoning, can you send me keywords or links? I've read
>>> some through google, but I'm not really satisfied with anything I've found.
>>>
>>> Thanks,
>>>
>>> Dave
>>>
>>> On Sun, Jun 27, 2010 at 1:31 AM, David Jones <davidher...@gmail.com>wrote:
>>>
>>>> A method for comparing hypotheses in explanatory-based reasoning: *
>>>>
>>>> We prefer the hypothesis or explanation that ***expects* more
>>>> observations. If both explanations expect the same observations, then the
>>>> simpler of the two is preferred (because the unnecessary terms of the more
>>>> complicated explanation do not add to the predictive power).*
>>>>
>>>> *Why are expected events so important?* They are a measure of 1)
>>>> explanatory power and 2) predictive power. The more predictive and the more
>>>> explanatory a hypothesis is, the more likely the hypothesis is when 
>>>> compared
>>>> to a competing hypothesis.
>>>>
>>>> Here are two case studies I've been analyzing from sensory perception of
>>>> simplified visual input:
>>>> The goal of the case studies is to answer the following: How do you
>>>> generate the most likely motion hypothesis in a way that is general and
>>>> applicable to AGI?
>>>> *Case Study 1)* Here is a link to an example: animated gif of two black
>>>> squares move from left to 
>>>> right<http://practicalai.org/images/CaseStudy1.gif>.
>>>> *Description: *Two black squares are moving in unison from left to
>>>> right across a white screen. In each frame the black squares shift to the
>>>> right so that square 1 steals square 2's original position and square two
>>>> moves an equal distance to the right.
>>>> *Case Study 2) *Here is a link to an example: the interrupted 
>>>> square<http://practicalai.org/images/CaseStudy2.gif>.
>>>> *Description:* A single square is moving from left to right. Suddenly
>>>> in the third frame, a single black square is added in the middle of the
>>>> expected path of the original black square. This second square just stays
>>>> there. So, what happened? Did the square moving from left to right keep
>>>> moving? Or did it stop and then another square suddenly appeared and moved
>>>> from left to right?
>>>>
>>>> *Here is a simplified version of how we solve case study 1:
>>>> *The important hypotheses to consider are:
>>>> 1) the square from frame 1 of the video that has a very close position
>>>> to the square from frame 2 should be matched (we hypothesize that they are
>>>> the same square and that any difference in position is motion).  So, what
>>>> happens is that in each two frames of the video, we only match one square.
>>>> The other square goes unmatched.
>>>> 2) We do the same thing as in hypothesis #1, but this time we also match
>>>> the remaining squares and hypothesize motion as follows: the first square
>>>> jumps over the second square from left to right. We hypothesize that this
>>>> happens over and over in each frame of the video. Square 2 stops and square
>>>> 1 jumps over it.... over and over again.
>>>> 3) We hypothesize that both squares move to the right in unison. This is
>>>> the correct hypothesis.
>>>>
>>>> So, why should we prefer the correct hypothesis, #3 over the other two?
>>>>
>>>> Well, first of all, #3 is correct because it has the most explanatory
>>>> power of the three and is the simplest of the three. Simpler is better
>>>> because, with the given evidence and information, there is no reason to
>>>> desire a more complicated hypothesis such as #2.
>>>>
>>>> So, the answer to the question is because explanation #3 expects the
>>>> most observations, such as:
>>>> 1) the consistent relative positions of the squares in each frame are
>>>> expected.
>>>> 2) It also expects their new positions in each from based on velocity
>>>> calculations.
>>>> 3) It expects both squares to occur in each frame.
>>>>
>>>> Explanation 1 ignores 1 square from each frame of the video, because it
>>>> can't match it. Hypothesis #1 doesn't have a reason for why the a new 
>>>> square
>>>> appears in each frame and why one disappears. It doesn't expect these
>>>> observations. In fact, explanation 1 doesn't expect anything that happens
>>>> because something new happens in each frame, which doesn't give it a chance
>>>> to confirm its hypotheses in subsequent frames.
>>>>
>>>> The power of this method is immediately clear. It is general and it
>>>> solves the problem very cleanly.
>>>>
>>>> *Here is a simplified version of how we solve case study 2:*
>>>> We expect the original square to move at a similar velocity from left to
>>>> right because we hypothesized that it did move from left to right and we
>>>> calculated its velocity. If this expectation is confirmed, then it is more
>>>> likely than saying that the square suddenly stopped and another started
>>>> moving. Such a change would be unexpected and such a conclusion would be
>>>> unjustifiable.
>>>>
>>>> I also believe that explanations which generate fewer incorrect
>>>> expectations should be preferred over those that more incorrect
>>>> expectations.
>>>>
>>>> The idea I came up with earlier this month regarding high frame rates to
>>>> reduce uncertainty is still applicable. It is important that all generated
>>>> hypotheses have as low uncertainty as possible given our constraints and
>>>> resources available.
>>>>
>>>> I thought I'd share my progress with you all. I'll be testing the ideas
>>>> on test cases such as the ones I mentioned in the coming days and weeks.
>>>>
>>>> Dave
>>>>
>>>
>>>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>>> <https://www.listbox.com/member/archive/rss/303/> | 
>>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>>> <http://www.listbox.com>
>>>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>>> <https://www.listbox.com/member/archive/rss/303/> | 
>>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>>> <http://www.listbox.com>
>>>
>>
>>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>> <https://www.listbox.com/member/archive/rss/303/> | 
>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>> <http://www.listbox.com>
>>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>> <https://www.listbox.com/member/archive/rss/303/> | 
>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>> <http://www.listbox.com>
>>
>
>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/> | 
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/> | 
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Re: Huge Progress on the Core of AGI

Reply via email to