You can always find languages that favor either hypothesis. Suppose that you
want to predict the sequence 10, 21, 32, ? and we write our hypothesis as a
function that takes the trial number (0, 1, 2, 3...) and returns the outcome.
The sequence 10, 21, 32, 43, 54... would be coded:
int hypothesis_1(int trial) {
return trial*11+10;
}
The sequence 10, 21, 32, 10, 21, 32... would be coded
int hypothesis_2(int trial) {
return trial%3*11+10;
}
which is longer and therefore less likely.
Here is another example: predict the sequence 0, 1, 4, 9, 16, 25, 36, 49, ?
Can you find a program shorter than this that doesn't predict 64?
int hypothesis_1(int trial) {
return trial*trial;
}
-- Matt Mahoney, [email protected]
________________________________
From: David Jones <[email protected]>
To: agi <[email protected]>
Sent: Tue, June 29, 2010 3:48:01 PM
Subject: Re: [agi] Re: Huge Progress on the Core of AGI
Such an example is no where near sufficient to accept the assertion that
program size is the right way to define simplicity of a hypothesis.
Here is a counter example. It requires a slightly more complex example because
all zeros doesn't leave any room for alternative hypotheses.
Here is the sequence: 10, 21, 32
void hypothesis_1() {
int ten = 10;
int counter = 0;
while (1)
{
print(ten+counter)
ten = ten + 10;
counter = counter + 1;
}
}
void hypothesis_2() {
while (1)
print("10 21 32")
}
Hypothesis 2 is simpler, yet clearly wrong. These examples don't really show
anything.
Dave
On Tue, Jun 29, 2010 at 3:15 PM, Matt Mahoney <[email protected]> wrote:
David Jones wrote:
>> I really don't think this is the right way to calculate simplicity.
>
>
>I will give you an example, because examples are more convincing than proofs.
>
>
>Suppose you perform a sequence of experiments whose outcome can either be 0 or
>1. In the first 10 trials you observe 0000000000. What do you expect to
>observe in the next trial?
>
>
>Hypothesis 1: the outcome is always 0.
>Hypothesis 2: the outcome is 0 for the first 10 trials and 1 thereafter.
>
>
>Hypothesis 1 is shorter than 2, so it is more likely to be correct.
>
>
>If I describe
> the two hypotheses in French or Chinese, then 1 is still shorter than 2.
>
>
>If I describe the two hypotheses in C, then 1 is shorter than 2.
>
>
> void hypothesis_1() {
>> while (1) printf("0");
> }
>
>
> void hypothesis_2() {
> int i;
> for (i=0; i<10; ++i) printf("0");
> while (1) printf("1");
> }
>
>
>If I translate these programs into Perl or Lisp or x86 assembler, then 1 will
>still be shorter than 2.
>
>
>I realize there might be smaller equivalent programs. But I think you could
>find a smaller program equivalent to hypothesis_1 than hypothesis_2.
>
>
>I realize there are other hypotheses than 1 or 2. But I think that the
>smallest one you can find that outputs
> eleven bits of which the first ten are zeros will be a program that outputs
> another zero.
>
>
>I realize that you could rewrite 1 so that it is longer than 2. But it is the
>shortest version that counts. More specifically consider all programs in which
>the first 10 outputs are 0. Then weight each program by 2^-length. So the
>shortest programs dominate.
>
>
>I realize you could make up a language where the shortest encoding of
>hypothesis 2 is shorter than 1. You could do this for any pair of hypotheses.
>However, I think if you stick to "simple" languages (and I realize this is a
>circular definition), then 1 will usually be shorter than 2.
>
> -- Matt Mahoney, [email protected]
>
>
>
>
>
________________________________
From: David Jones <[email protected]>
>To: agi <[email protected]>
>Sent: Tue, June 29, 2010 1:31:01 PM
>
>Subject: Re: [agi] Re: Huge Progress on the Core of AGI
>
>
>
>
>
>>On Tue, Jun 29, 2010 at 11:26 AM, Matt Mahoney <[email protected]> wrote:
>
>>>
>>> Right. But Occam's Razor is not complete. It says simpler is better, but 1)
>>> this only applies when two hypotheses have the same explanatory power and
>>> 2) what defines simpler?
>>
>>
>>A hypothesis is a program that outputs the observed data. It "explains" the
>>data if its output matches what is observed. The "simpler" hypothesis is the
>>shorter program, measured in bits.
>
>I can't be confident that bits is the right way to do it. I suspect bits is an
>approximation of a more accurate method. I also suspect that you can write a
>more complex explanation "program" with the same number of bits. So, there are
>some flaws with this approach. It is an interesting idea to consider though.
>>
>
>
>
>>
>>The language used to describe the data can be any Turing complete programming
>>language (C, Lisp, etc) or any natural language such as English. It does not
>>matter much which language you use, because for any two languages there is a
>>fixed length procedure, described in either of the languages, independent of
>>the data, that translates descriptions in
>> one language to the other.
>
>Hypotheses don't have to be written in actual computer code and probably
>shouldn't be because hypotheses are not really meant to be "run" per say. And
>outputs are not necessarily the right way to put it either. Outputs imply
>prediction. And as mike has often pointed out, things cannot be precisely
>predicted. We can, however, determine whether a particular observation fits
>expectations, rather than equals some prediction. There may be multiple
>possible outcomes that we expect and which would be consistent with a
>hypothesis, which is why actual prediction should not be used.
>
>
>> For example, the simplest hypothesis for all visual interpretation is that
>> everything in the first image is gone in the second image, and everything in
>> the second image is a new object. Simple. Done. Solved :) right?
>>
>>
>>The hypothesis is not the simplest. The program that outputs the two frames
>>as if independent cannot be smaller than the two frames compressed
>>independently. The program could be made smaller if it only described how the
>>second frame is different than the first. It would be more likely to
>>correctly predict the third frame if it continued to run and described how it
>>would be different than the second frame.
>
>I really don't think this is the right way to calculate simplicity.
>
>
>>>
>>
>>
>>> I don't think much progress has been made in this area, but I'd like to
>>> know what other people have done and any successes they've had.
>>
>>
>>Kolmogorov proved that the solution is not
>> computable. Given a hypothesis (a description of the observed data, or a
>> program that outputs the observed data), there is no general procedure or
>> test to determine whether a shorter (simpler, better) hypothesis exists.
>> Proof: suppose there were. Then I could describe "the first data set that
>> cannot be described in less than a million bits" even though I just did. (By
>> "first" I mean the first data set encoded by a string from shortest to
>> longest, breaking ties lexicographically).
>
>You are making a different point than the question I asked. It may not be
>computable to determine whether a better hypothesis exists, but that is not
>what I want to know. What I want to know is what is the better of any two
>hypotheses I can come up with. Creating the hypotheses requires a different
>algorithm, which is not as hard to me as the other parts of the problem.
>>
>
>
>
>>
>>That said, I believe the state of the art in both language and vision are
>>based on hierarchical neural models, i.e. pattern recognition using learned
>>weighted combinations of simpler patterns. I am more familiar with language.
>>The top ranked programs can be found at http://mattmahoney.net/dc/text.html
>
>state of the art in explanatory reasoning is what I'm looking for.
>
>
>>>
>>
>> -- Matt Mahoney, [email protected]
>>
>>
>>
>>
>>
________________________________
From: David Jones <[email protected]>
>>To: agi <[email protected]>
>>Sent: Tue, June 29, 2010 10:44:41 AM
>>Subject: Re: [agi] Re: Huge Progress on the Core of AGI
>>
>>
>>Thanks Matt,
>>
>>Right. But Occam's Razor is not complete. It says simpler is better, but 1)
>>this only applies when two hypotheses have the same explanatory power and 2)
>>what defines simpler?
>>
>>So, maybe what I want to know from the state of the art in research is:
>>
>>1) how precisely do other people define "simpler"
>>and
>>2) More importantly, how do you compare competing explanations/hypotheses
>>that have more or less explanatory power. Simpler does not apply unless you
>>are comparing equally explanatory hypotheses.
>>
>>For example, the simplest hypothesis for all visual interpretation is that
>>everything in the first image is gone in the second image, and everything in
>>the second image is a new object. Simple. Done. Solved :) right? Well,
>>clearly a more complicated explanation is warranted because a more
>>complicated explanation is more *explanatory* and a better explanation. So,
>>why is it better? Can it be defined as better in a precise way so that you
>>can compare arbitrary hypotheses or explanations? That is what I'm trying to
>>learn about. I don't think much progress has been made in this area, but I'd
>>like to know what other people have done and any successes they've had.
>>
>>Dave
>>
>>
>>
>>On Tue, Jun 29, 2010 at 10:29 AM, Matt Mahoney <[email protected]> wrote:
>>
>>David Jones wrote:
>>>> If anyone has any knowledge of or references to the state of the art in
>>>> explanation-based reasoning, can you send me keywords or links?
>>>
>>>
>>>The simplest explanation of the past is the best predictor of the future.
>>>http://en.wikipedia.org/wiki/Occam's_razor
>>>http://www.scholarpedia.org/article/Algorithmic_probability
>>>
>>> -- Matt Mahoney, [email protected]
>>>>>>
>>>
>>>
>>>
>>>
________________________________
From: David Jones <[email protected]>
>>>>>>
>>>
>>>To: agi <[email protected]>
>>>Sent: Tue, June 29, 2010 9:05:45 AM
>>>Subject: [agi] Re: Huge Progress on the Core of AGI
>>>
>>>
>>>If anyone has any knowledge of or references to the state of the art in
>>>explanation-based reasoning, can you send me keywords or links? I've read
>>>some through google, but I'm not really satisfied with anything I've found.
>>>
>>>Thanks,
>>>
>>>Dave
>>>
>>>
>>>On Sun, Jun 27, 2010 at 1:31 AM, David Jones <[email protected]> wrote:
>>>
>>>>>>>A method for comparing hypotheses in explanatory-based reasoning:
>>>>
>>>>We prefer the hypothesis or explanation that *expects* more observations.
>>>>If both explanations expect the same observations, then the simpler of the
>>>>two is preferred (because the unnecessary terms of the more complicated
>>>>explanation do not add to the predictive power).
>>>>
>>>>Why are expected events so important? They are a measure of 1) explanatory
>>>>power and 2) predictive power. The more predictive and
>>>>the more explanatory a hypothesis is, the more likely the hypothesis is
>>>>when compared to a competing hypothesis.
>>>>
>>>>Here are two case studies I've been analyzing from sensory perception of
>>>>simplified visual input:
>>>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>The goal of the case studies is to answer the following: How do you
>>>>generate the most likely motion hypothesis in a way that is
>>>>general and applicable to AGI?
>>>>Case Study 1) Here is a link to an example: animated gif of two black
>>>>squares move from left to right. Description: Two black squares are moving
>>>>in unison from left to right across a white screen. In each frame the black
>>>>squares shift to the right so that square 1 steals square 2's original
>>>>position and square two moves an equal distance to the right.
>>>>Case Study 2) Here is a link to an example: the interrupted square.
>>>>Description: A single square is moving from left to right. Suddenly in the
>>>>third frame, a single black square is added in the middle of the expected
>>>>path of the original black square. This second square just stays there. So,
>>>>what happened? Did the square moving from left to right keep moving? Or did
>>>>it stop and then another square suddenly appeared and moved from left to
>>>>right?
>>>>
>>>>Here is a simplified version of how we solve case study 1:
>>>>The important hypotheses to consider are:
>>>>1) the square from frame 1 of the video that has a very close position to
>>>>the square from frame 2 should be matched (we hypothesize that they are the
>>>>same square and that any difference in position is motion). So, what
>>>>happens is that in each two frames of the video, we only match one square.
>>>>The other square goes unmatched.
>>>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>2) We do the same thing as in hypothesis #1, but this time we also match
>>>>the remaining squares and hypothesize motion as follows: the first square
>>>>jumps over the second square from left to right. We hypothesize that this
>>>>happens over and over in each frame of the video. Square 2 stops and square
>>>>1 jumps over it.... over and over again.
>>>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>3) We hypothesize that both squares move to the right in unison. This is
>>>>the correct hypothesis.
>>>>
>>>>So, why should we prefer the correct hypothesis, #3 over the other two?
>>>>
>>>>Well, first of all, #3 is correct because it has the most explanatory power
>>>>of the three and is the simplest of the three. Simpler is better because,
>>>>with the given evidence and information, there is no reason to desire a
>>>>more complicated hypothesis such as #2.
>>>>
>>>>So, the answer to the question is because explanation #3 expects the most
>>>>observations, such as:
>>>>1) the consistent relative positions of the squares in each frame are
>>>>expected.
>>>>2) It also expects their new positions in each from based on velocity
>>>>calculations.
>>>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>3) It expects both squares to occur in each frame.
>>>>
>>>>Explanation 1 ignores 1 square from each frame of the video, because it
>>>>can't match it. Hypothesis #1 doesn't have a reason for why the a new
>>>>square appears in each frame and why one disappears. It doesn't expect
>>>>these observations. In fact, explanation 1 doesn't expect anything that
>>>>happens because something new happens in each frame, which doesn't give it
>>>>a chance to confirm its hypotheses in subsequent frames.
>>>>
>>>>The power of this method is immediately clear. It is general and it solves
>>>>the problem very cleanly.
>>>>
>>>>Here is a simplified version of how we solve case study 2:
>>>>We expect the original square to move at a similar velocity from left to
>>>>right because we hypothesized that it did move from left to right and we
>>>>calculated its velocity. If this expectation is confirmed, then it is more
>>>>likely than saying that the square suddenly stopped and another started
>>>>moving. Such a change would be unexpected and such a conclusion would be
>>>>unjustifiable.
>>>>
>>>>I also believe that explanations which generate fewer incorrect
>>>>expectations should be preferred over those that more incorrect
>>>>expectations.
>>>>
>>>>The idea I came up with earlier this month regarding high frame rates to
>>>>reduce uncertainty is still applicable. It is important that all generated
>>>>hypotheses have as low uncertainty as possible given our constraints and
>>>>resources available.
>>>>
>>>>I thought I'd share my progress with you all. I'll be testing the ideas on
>>>>test cases such as the ones I mentioned in the coming days and weeks.
>>>>
>>>>Dave
>>>>
>>>
>>>>>>
>>>agi | Archives >>> | Modify >>> Your Subscription
>>>>>>
>>>agi | Archives >>> | Modify >>> Your Subscription
>>
>>>>
>>agi | Archives >> | Modify >> Your Subscription
>>>>
>>agi | Archives >> | Modify >> Your Subscription
>
>>
>agi | Archives > | Modify > Your Subscription
>>
>agi | Archives > | Modify > Your Subscription
agi | Archives | Modify Your Subscription
-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com