David Jones wrote:
> I really don't think this is the right way to calculate simplicity. 

I will give you an example, because examples are more convincing than proofs.

Suppose you perform a sequence of experiments whose outcome can either be 0 or 
1. In the first 10 trials you observe 0000000000. What do you expect to observe 
in the next trial?

Hypothesis 1: the outcome is always 0.
Hypothesis 2: the outcome is 0 for the first 10 trials and 1 thereafter.

Hypothesis 1 is shorter than 2, so it is more likely to be correct.

If I describe the two hypotheses in French or Chinese, then 1 is still shorter 
than 2.

If I describe the two hypotheses in C, then 1 is shorter than 2.

  void hypothesis_1() {
    while (1) printf("0");
  }

  void hypothesis_2() {
    int i;
    for (i=0; i<10; ++i) printf("0");
    while (1) printf("1");
  }

If I translate these programs into Perl or Lisp or x86 assembler, then 1 will 
still be shorter than 2.

I realize there might be smaller equivalent programs. But I think you could 
find a smaller program equivalent to hypothesis_1 than hypothesis_2.

I realize there are other hypotheses than 1 or 2. But I think that the smallest 
one you can find that outputs eleven bits of which the first ten are zeros will 
be a program that outputs another zero.

I realize that you could rewrite 1 so that it is longer than 2. But it is the 
shortest version that counts. More specifically consider all programs in which 
the first 10 outputs are 0. Then weight each program by 2^-length. So the 
shortest programs dominate.

I realize you could make up a language where the shortest encoding of 
hypothesis 2 is shorter than 1. You could do this for any pair of hypotheses. 
However, I think if you stick to "simple" languages (and I realize this is a 
circular definition), then 1 will usually be shorter than 2.

 -- Matt Mahoney, matmaho...@yahoo.com




________________________________
From: David Jones <davidher...@gmail.com>
To: agi <agi@v2.listbox.com>
Sent: Tue, June 29, 2010 1:31:01 PM
Subject: Re: [agi] Re: Huge Progress on the Core of AGI




On Tue, Jun 29, 2010 at 11:26 AM, Matt Mahoney <matmaho...@yahoo.com> wrote:

> Right. But Occam's Razor is not complete. It says simpler is better, but 1) 
> this only applies when two hypotheses have the same explanatory power and 2) 
> what defines simpler? 
>
>
>A hypothesis is a program that outputs the observed data. It "explains" the 
>data if its output matches what is observed. The "simpler" hypothesis is the 
>shorter program, measured in bits.

I can't be confident that bits is the right way to do it. I suspect bits is an 
approximation of a more accurate method. I also suspect that you can write a 
more complex explanation "program" with the same number of bits. So, there are 
some flaws with this approach. It is an interesting idea to consider though. 
 


>
>The language used to describe the data can be any Turing complete programming 
>language (C, Lisp, etc) or any natural language such as English. It does not 
>matter much which language you use, because for any two languages there is a 
>fixed length procedure, described in either of the languages, independent of 
>the data, that translates descriptions in
> one language to the other.

Hypotheses don't have to be written in actual computer code and probably 
shouldn't be because hypotheses are not really meant to be "run" per say. And 
outputs are not necessarily the right way to put it either. Outputs imply 
prediction. And as mike has often pointed out, things cannot be precisely 
predicted. We can, however, determine whether a particular observation fits 
expectations, rather than equals some prediction. There may be multiple 
possible outcomes that we expect and which would be consistent with a 
hypothesis, which is why actual prediction should not be used.


> For example, the simplest hypothesis for all visual interpretation is that 
> everything in the first image is gone in the second image, and everything in 
> the second image is a new object. Simple. Done. Solved :) right? 
>
>
>The hypothesis is not the simplest. The program that outputs the two frames as 
>if independent cannot be smaller than the two frames compressed independently. 
>The program could be made smaller if it only described how the second frame is 
>different than the first. It would be more likely to correctly predict the 
>third frame if it continued to run and described how it would be different 
>than the second frame.

I really don't think this is the right way to calculate simplicity. 
 


>
>> I don't think much progress has been made in this area, but I'd like to know 
>> what other people have done and any successes they've had.
>
>
>Kolmogorov proved that the solution is not
> computable. Given a hypothesis (a description of the observed data, or a 
> program that outputs the observed data), there is no general procedure or 
> test to determine whether a shorter (simpler, better) hypothesis exists. 
> Proof: suppose there were. Then I could describe "the first data set that 
> cannot be described in less than a million bits" even though I just did. (By 
> "first" I mean the first data set encoded by a string from shortest to 
> longest, breaking ties lexicographically).

You are making a different point than the question I asked. It may not be 
computable to determine whether a better hypothesis exists, but that is not 
what I want to know. What I want to know is what is the better of any two 
hypotheses I can come up with. Creating the hypotheses requires a different 
algorithm, which is not as hard to me as the other parts of the problem. 
 


>
>That said, I believe the state of the art in both language and vision are 
>based on hierarchical neural models, i.e. pattern recognition using learned 
>weighted combinations of simpler patterns. I am more familiar with language. 
>The top ranked programs can be found at http://mattmahoney.net/dc/text.html

state of the art in explanatory reasoning is what I'm looking for.
 


> -- Matt Mahoney, matmaho...@yahoo.com
>
>
>
>
>
________________________________
From: David Jones <davidher...@gmail.com>
>To: agi <agi@v2.listbox.com>
>Sent: Tue, June 29, 2010 10:44:41 AM
>Subject: Re: [agi] Re: Huge Progress on the Core of AGI
>
>
>Thanks Matt,
>
>Right. But Occam's Razor is not complete. It says simpler is better, but 1) 
>this only applies when two hypotheses have the same explanatory power and 2) 
>what defines simpler? 
>
>So, maybe what I want to know from the state of the art in research is: 
>
>1) how precisely do other people define "simpler"
>and
>2) More importantly, how do you compare competing explanations/hypotheses that 
>have more or less explanatory power. Simpler does not apply unless you are 
>comparing equally explanatory hypotheses. 
>
>For example, the simplest hypothesis for all visual interpretation is that 
>everything in the first image is gone in the second image, and everything in 
>the second image is a new object. Simple. Done. Solved :) right? Well, clearly 
>a more complicated explanation is warranted because a more complicated 
>explanation is more *explanatory* and a better explanation. So, why is it 
>better? Can it be defined as better in a precise way so that you can compare 
>arbitrary hypotheses or explanations? That is what I'm trying to learn about. 
>I don't think much progress has been made in this area, but I'd like to know 
>what other people have done and any successes they've had.
>
>Dave
>
>
>
>On Tue, Jun 29, 2010 at 10:29 AM, Matt Mahoney <matmaho...@yahoo.com> wrote:
>
>David Jones wrote:
>>> If anyone has any knowledge of or references to the state of the art in 
>>> explanation-based reasoning, can you send me keywords or links? 
>>
>>
>>The simplest explanation of the past is the best predictor of the future.
>>http://en.wikipedia.org/wiki/Occam's_razor
>>http://www.scholarpedia.org/article/Algorithmic_probability
>>
>> -- Matt Mahoney, matmaho...@yahoo.com
>>
>>
>>
>>
>>
________________________________
From: David Jones <davidher...@gmail.com>
>>>>
>>To: agi <agi@v2.listbox.com>
>>Sent: Tue, June 29, 2010 9:05:45 AM
>>Subject: [agi] Re: Huge Progress on the Core of AGI
>>
>>
>>If anyone has any knowledge of or references to the state of the art in 
>>explanation-based reasoning, can you send me keywords or links? I've read 
>>some through google, but I'm not really satisfied with anything I've found. 
>>
>>Thanks,
>>
>>Dave
>>
>>
>>On Sun, Jun 27, 2010 at 1:31 AM, David Jones <davidher...@gmail.com> wrote:
>>
>>>>>A method for comparing hypotheses in explanatory-based reasoning: 
>>>
>>>We prefer the hypothesis or explanation that *expects* more observations. If 
>>>both explanations expect the same observations, then the simpler of the two 
>>>is preferred (because the unnecessary terms of the more complicated 
>>>explanation do not add to the predictive power). 
>>>
>>>Why are expected events so important? They are a measure of 1) explanatory 
>>>power and 2) predictive power. The more predictive and 
>>>the more explanatory a hypothesis is, the more likely the hypothesis is when 
>>>compared to a competing hypothesis.
>>>
>>>Here are two case studies I've been analyzing from sensory perception of 
>>>simplified visual input:
>>>>>>
>>>
>>>
>>>
>>>The goal of the case studies is to answer the following: How do you generate 
>>>the most likely motion hypothesis in a way that is 
>>>general and applicable to AGI?
>>>Case Study 1) Here is a link to an example: animated gif of two black 
>>>squares move from left to right. Description: Two black squares are moving 
>>>in unison from left to right across a white screen. In each frame the black 
>>>squares shift to the right so that square 1 steals square 2's original 
>>>position and square two moves an equal distance to the right.
>>>Case Study 2) Here is a link to an example: the interrupted square. 
>>>Description: A single square is moving from left to right. Suddenly in the 
>>>third frame, a single black square is added in the middle of the expected 
>>>path of the original black square. This second square just stays there. So, 
>>>what happened? Did the square moving from left to right keep moving? Or did 
>>>it stop and then another square suddenly appeared and moved from left to 
>>>right?
>>>
>>>Here is a simplified version of how we solve case study 1:
>>>The important hypotheses to consider are: 
>>>1) the square from frame 1 of the video that has a very close position to 
>>>the square from frame 2 should be matched (we hypothesize that they are the 
>>>same square and that any difference in position is motion).  So, what 
>>>happens is that in each two frames of the video, we only match one square. 
>>>The other square goes unmatched.   
>>>>>>
>>>
>>>
>>>
>>>2) We do the same thing as in hypothesis #1, but this time we also match the 
>>>remaining squares and hypothesize motion as follows: the first square jumps 
>>>over the second square from left to right. We hypothesize that this happens 
>>>over and over in each frame of the video. Square 2 stops and square 1 jumps 
>>>over it.... over and over again. 
>>>>>>
>>>
>>>
>>>
>>>3) We hypothesize that both squares move to the right in unison. This is the 
>>>correct hypothesis.
>>>
>>>So, why should we prefer the correct hypothesis, #3 over the other two?
>>>
>>>Well, first of all, #3 is correct because it has the most explanatory power 
>>>of the three and is the simplest of the three. Simpler is better because, 
>>>with the given evidence and information, there is no reason to desire a more 
>>>complicated hypothesis such as #2. 
>>>
>>>So, the answer to the question is because explanation #3 expects the most 
>>>observations, such as: 
>>>1) the consistent relative positions of the squares in each frame are 
>>>expected. 
>>>2) It also expects their new positions in each from based on velocity 
>>>calculations. 
>>>>>>
>>>
>>>
>>>
>>>3) It expects both squares to occur in each frame. 
>>>
>>>Explanation 1 ignores 1 square from each frame of the video, because it 
>>>can't match it. Hypothesis #1 doesn't have a reason for why the a new square 
>>>appears in each frame and why one disappears. It doesn't expect these 
>>>observations. In fact, explanation 1 doesn't expect anything that happens 
>>>because something new happens in each frame, which doesn't give it a chance 
>>>to confirm its hypotheses in subsequent frames.
>>>
>>>The power of this method is immediately clear. It is general and it solves 
>>>the problem very cleanly.
>>>
>>>Here is a simplified version of how we solve case study 2:
>>>We expect the original square to move at a similar velocity from left to 
>>>right because we hypothesized that it did move from left to right and we 
>>>calculated its velocity. If this expectation is confirmed, then it is more 
>>>likely than saying that the square suddenly stopped and another started 
>>>moving. Such a change would be unexpected and such a conclusion would be 
>>>unjustifiable. 
>>>
>>>I also believe that explanations which generate fewer incorrect expectations 
>>>should be preferred over those that more incorrect expectations.
>>>
>>>The idea I came up with earlier this month regarding high frame rates to 
>>>reduce uncertainty is still applicable. It is important that all generated 
>>>hypotheses have as low uncertainty as possible given our constraints and 
>>>resources available.
>>>
>>>I thought I'd share my progress with you all. I'll be testing the ideas on 
>>>test cases such as the ones I mentioned in the coming days and weeks. 
>>>
>>>Dave
>>>
>>
>>>>
>>agi | Archives  >> | Modify >> Your Subscription  
>>>>
>>agi | Archives  >> | Modify >> Your Subscription  
>
>>
>agi | Archives  > | Modify > Your Subscription  
>>
>agi | Archives  > | Modify > Your Subscription  

agi | Archives  | Modify Your Subscription  


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Reply via email to