Occam's Razor is not a provable theory, and I have come across philosophers of science who also question it's value as a scientific heuristic. I can look for some more thorough presentations and I am willing to give you some of my opinions on that question if you want me to. The "evidence" would be drawn from cases where a simpler theory was later wiped out because a more complicated theory better explained the results of scientificly gathered evidence. Einstein's Relativity theories added more complexity to Newton's theories, but they better explained details and results that Newton's theories.
I cannot recall the details of why I believe that the central premise of algorithmic information theory is incomputable due to Cantor's diagonalization argument but I thought they underlay the reasoning of Chaitin's Incompleteness Theory http://en.wikipedia.org/wiki/Kolmogorov_complexity and that this is why the shortest programs that output a string cannot, in general, be computed. Jim Bromer On Wed, Jun 30, 2010 at 5:13 PM, Matt Mahoney <[email protected]> wrote: > Jim, what evidence do you have that Occam's Razor or algorithmic > information theory is wrong, besides your own opinions? It is well > established that elegant (short) theories are preferred in all branches of > science because they have greater predictive power. > > Also, what does this have to do with Cantor's diagonalization argument? AIT > considers only the countably infinite set of hypotheses. > > > -- Matt Mahoney, [email protected] > > > ------------------------------ > *From:* Jim Bromer <[email protected]> > > *To:* agi <[email protected]> > *Sent:* Wed, June 30, 2010 9:13:44 AM > *Subject:* Re: [agi] Re: Huge Progress on the Core of AGI > > On Tue, Jun 29, 2010 at 11:46 PM, Abram Demski <[email protected]>wrote: > In brief, the answer to your question is: we formalize the description > length heuristic by assigning lower probabilities to longer hypotheses, and > we apply Bayes law to update these probabilities given the data we observe. > This updating captures the idea that we should reward theories which > explain/expect more of the observations; it also provides a natural way to > balance simplicity vs explanatory power, so that we can compare any two > theories with a single scoring mechanism. Bayes Law automatically places the > right amount of pressure to avoid overly elegant explanations which don't > get much right, and to avoid overly complex explanations which fit the > observations perfectly but which probably won't generalize to new data. > ... > If you go down this path, you will eventually come to understand (and, > probably, accept) algorithmic information theory. Matt may be tring to force > it on you too soon. :) > --Abram > > David was asking about theories of explanation, and here you are suggesting > that following a certain path of reasoning will lead to accepting AIT. What > nonsense. Even assuming that Baye's law can be used to update probabilities > of idealized utility, the connection between description length and > explanatory power in general AI is tenuous. And when you realize that AIT > is an unattainable idealism that lacks mathematical power (I do not believe > that it is a valid mathematical method because it is incomputable and > therefore innumerable and cannot be used to derive probability distributions > even as ideals) you have to accept that the connection between explanatory > theories and AIT is not established except as a special case based on the > imagination that a similarities between a subclass of practical examples is > the same as a powerful generalization of those examples. > > The problem is that while compression seems to be related to intelligence, > it is not equivalent to intelligence. A much stronger but similarly false > argument is that memory is intelligence. Of course memory is a major part > of intelligence, but it is not everything. The argument that AIT is a > reasonable substitute for developing more sophisticated theories about > conceptual explanation is not well founded, it lacks any experimental > evidence other than a spattering of results on simplistic cases, and it is > just wrong to suggest that there is no reason to consider other theories of > explanation. > > Yes compression has something to do with intelligence and, in some special > cases it can be shown to act as an idealism for numerical rationality. And > yes unattainable theories that examine the boundaries of productive > mathematical systems is a legitimate subject for mathematics. But there is > so much more to theories of explanatory reasoning that I genuinely feel > sorry for those of you, who originally motivated to develop better AGI > programs, would get caught in the obvious traps of AIT and AIXI. > > Jim Bromer > > > On Tue, Jun 29, 2010 at 11:46 PM, Abram Demski <[email protected]>wrote: > >> David, >> >> What Matt is trying to explain is all right, but I think a better way of >> answering your question would be to invoke the mighty mysterious Bayes' Law. >> >> I had an epiphany similar to yours (the one that started this thread) >> about 5 years ago now. At the time I did not know that it had all been done >> before. I think many people feel this way about MDL. Looking into the MDL >> (minimum description length) literature would be a good starting point. >> >> In brief, the answer to your question is: we formalize the description >> length heuristic by assigning lower probabilities to longer hypotheses, and >> we apply Bayes law to update these probabilities given the data we observe. >> This updating captures the idea that we should reward theories which >> explain/expect more of the observations; it also provides a natural way to >> balance simplicity vs explanatory power, so that we can compare any two >> theories with a single scoring mechanism. Bayes Law automatically places the >> right amount of pressure to avoid overly elegant explanations which don't >> get much right, and to avoid overly complex explanations which fit the >> observations perfectly but which probably won't generalize to new data. >> >> Bayes' Law and MDL have strong connections, though sometimes they part >> ways. There are deep theorems here. For me it's good enough to note that if >> we're using a maximally efficient code for our knowledge representation, >> they are equivalent. (This in itself involves some deep math; I can explain >> if you're interested, though I believe I've already posted a writeup to this >> list in the past.) Bayesian updating is essentially equivalent to scoring >> hypotheses as: hypothesis size + size of data's description using >> hypothesis. Lower scores are better (as the score is approximately >> -log(probability)). >> >> If you go down this path, you will eventually come to understand (and, >> probably, accept) algorithmic information theory. Matt may be tring to force >> it on you too soon. :) >> >> --Abram >> > *agi* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com/> > *agi* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com/> > ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
