Schmidhuber's Speed Prior-Based Inductive Inference <http://people.idsia.ch/~juergen/toesv2/node32.html> may be the most principled way of introducing time but it seems to lack a principled way of making bits and time commensurable so he just punts and says "Kt" which has the dimension of information*time. Not good enough for model selection as this is a Pareto Frontier.
Another way of stating the theoretic reason for Occam's Razor is that additional complexity beyond K introduces counterfactuals -- facts not in evidence or observations not yet made. Can someone come up with a reason to introduce facts not in evidence while predicting the future, except as part of the deductive process (ie, letting the executable archive continue "decompressing" beyond the last observation/datum)? On Thu, Nov 21, 2019 at 7:53 PM Matt Mahoney <[email protected]> wrote: > I believe the deeper question is why Occam's Razor works so well across > every branch of science. What makes Solomonoff induction the core principle > of machine learning? Why is it so successful at prediction in every domain? > Why are theories that require fewer words or symbols to describe more > likely to be correct? We only know empirically that it works. > > I believe the theoretical reason is that all possible probably > distributions over any infinite set of strings with p > 0 must favor > shorter strings over longer ones. For any possible theory or model encoded > as a string, there are an infinite number of longer and less likely strings > but only a finite number of longer and more likely strings. > > On Thu, Nov 21, 2019, 7:40 PM James Bowery <[email protected]> wrote: > >> I will agree however that if there is a principled way to include both >> time and space as constraints in a model selection criterion it makes >> pragmatic sense at the very least because what one is trying to do is >> predict which by definition is in time. >> >> On Thursday, November 21, 2019, James Bowery <[email protected]> wrote: >> >>> If I can spawn a finite but unlimited number of parallel processes in >>> "space", I can compute AIXItl, for example. So let's say the generating >>> space is projected down into 3D space + time -- it is approximated by time, >>> correct? In other words, once you admit "space" as a computation >>> dimension, don't you beg the question? >>> >>> On Thu, Nov 21, 2019 at 6:06 PM TimTyler <[email protected]> wrote: >>> >>>> On 2019-11-21 11:46:AM, James Bowery wrote: >>>> > The point of my conjecture is that there is a very good reason to >>>> > select "the smallest executable archive of the data" as your >>>> > information criterion over the other information criteria -- and it >>>> > has to do with the weakness of "lossy compression" as model selection. >>>> >>>> That, along with a number of other entries in the list is a >>>> "space-only" >>>> criterion. >>>> >>>> It seems reasonable that runtime duration,as well as program complexity >>>> is a >>>> >>>> factor for most real-world data. As well as being generated by a small >>>> system, >>>> >>>> observed data was probably generated in a limited time. Space-time >>>> metrics >>>> >>>> are clearly needed. I think we can reject any alleged superiority of any >>>> >>>> space-only metric. >>>> >>>> -- >>>> __________ >>>> |im |yler http://timtyler.org/ >>>> >>> *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/T0fc0d7591fcf61c5-M78668f96f817c0f61a1d7b23> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T0fc0d7591fcf61c5-M3ad015b1bf872495a0709576 Delivery options: https://agi.topicbox.com/groups/agi/subscription
