Schmidhuber's Speed Prior-Based Inductive Inference
<http://people.idsia.ch/~juergen/toesv2/node32.html> may be the most
principled way of introducing time but it seems to lack a principled way of
making bits and time commensurable so he just punts and says "Kt" which has
the dimension of information*time.  Not good enough for model selection as
this is a Pareto Frontier.

Another way of stating the theoretic reason for Occam's Razor is that
additional complexity beyond K introduces counterfactuals -- facts not in
evidence or observations not yet made.  Can someone come up with a reason
to introduce facts not in evidence while predicting the future, except as
part of the deductive process (ie, letting the executable archive continue
"decompressing" beyond the last observation/datum)?

On Thu, Nov 21, 2019 at 7:53 PM Matt Mahoney <[email protected]>
wrote:

> I believe the deeper question is why Occam's Razor works so well across
> every branch of science. What makes Solomonoff induction the core principle
> of machine learning? Why is it so successful at prediction in every domain?
> Why are theories that require fewer words or symbols to describe more
> likely to be correct? We only know empirically that it works.
>
> I believe the theoretical reason is that all possible probably
> distributions over any infinite set of strings with p > 0 must favor
> shorter strings over longer ones. For any possible theory or model encoded
> as a string, there are an infinite number of longer and less likely strings
> but only a finite number of longer and more likely strings.
>
> On Thu, Nov 21, 2019, 7:40 PM James Bowery <[email protected]> wrote:
>
>> I will agree however that if there is a principled way to include both
>> time and space as constraints in a model selection criterion it makes
>> pragmatic sense at the very least because what one is trying to do is
>> predict which by definition is in time.
>>
>> On Thursday, November 21, 2019, James Bowery <[email protected]> wrote:
>>
>>> If I can spawn a finite but unlimited number of parallel processes in
>>> "space", I can compute AIXItl, for example.  So let's say the generating
>>> space is projected down into 3D space + time -- it is approximated by time,
>>> correct?  In other words, once you admit "space" as a computation
>>> dimension, don't you beg the question?
>>>
>>> On Thu, Nov 21, 2019 at 6:06 PM TimTyler <[email protected]> wrote:
>>>
>>>> On 2019-11-21 11:46:AM, James Bowery wrote:
>>>> > The point of my conjecture is that there is a very good reason to
>>>> > select "the smallest executable archive of the data" as your
>>>> > information criterion over the other information criteria -- and it
>>>> > has to do with the weakness of "lossy compression" as model selection.
>>>> 
>>>> That, along with a number of other entries in the list is a
>>>> "space-only"
>>>> criterion.
>>>> 
>>>> It seems reasonable that runtime duration,as well as program complexity
>>>> is a
>>>> 
>>>> factor for most real-world data. As well as being generated by a small
>>>> system,
>>>> 
>>>> observed data was probably generated in a limited time. Space-time
>>>> metrics
>>>> 
>>>> are clearly needed. I think we can reject any alleged superiority of any
>>>> 
>>>> space-only metric.
>>>> 
>>>> --
>>>> __________
>>>> |im |yler http://timtyler.org/
>>>> 
>>> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/T0fc0d7591fcf61c5-M78668f96f817c0f61a1d7b23>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0fc0d7591fcf61c5-M3ad015b1bf872495a0709576
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to