Re: Wolfram 2,3 Turing Machine

2007-10-26 Thread Juergen Schmidhuber

Impressive result by Alex Smith!
Funny though how Wolfram's web sites on this
print Wolfram's name in larger font and more
frequently than Smith's, even trying to sell this
as New Kind Of Science although it's just a
continuation of a decades-old search for
small universal Turing machines :-)

BTW, check out Marcus Hutter's older posting to
the Kolmogorov Complexity mailing list on whether
such machines should really count as UTMs or not:
http://mailman.ti-edu.ch/pipermail/kolmogorov/2007/000245.html

JS
http://www.idsia.ch/~juergen/computeruniverse.html
http://www.idsia.ch/~juergen/wolfram.html

On Oct 24, 2007, at 8:32 PM, Tom Caylor wrote:

We're excited to announce that the $25,000 Wolfram 2,3 Turing Machine
Research Prize has been won.
Alex Smith, a 20-year-old undergraduate in Birmingham, UK, has given a
40-page proof that Wolfram's 2,3 Turing machine is indeed universal.
This result ends a half-century quest to find the simplest possible
universal Turing machine. It also provides strong further evidence for
Wolfram's Principle of Computational Equivalence. The official prize
ceremony is planned for November at Bletchley Park, UK, site of Alan
Turing's wartime work.
For more information about the prize and the solution, see:
http://www.wolframprize.org
Stephen Wolfram has posted his personal reaction to the prize at:
http://blog.wolfram.com/2007/10/the_prize_is_won_the_simplest.html




smime.p7s
Description: S/MIME cryptographic signature


Re: measure problem

2007-04-26 Thread Juergen Schmidhuber
Hi Max,

in this particular universe it's going well, thank you!

As promised, I had a look at your paper. I think
it is well written and fun to read. I've got a few comments
though, mostly on the nature of math vs computation,
and why Goedel is sexy but not an issue
when it comes to identifying possible mathematical
structures / universes / formally describable things.
I think some of the comments are serious enough to affect
the conclusions. Some come with quotes from papers in
http://www.idsia.ch/~juergen/computeruniverse.html
where several of your main issues are addressed.
Some are marked by Serious.

I am making a cc to the everythingers, although it seems
they are mostly interested in other things now - probably
nobody is really going to read this tedious response which
became much longer than I anticipated.

1. An abstract baggage-free mathematical structure
does not exist any more than a baggage-free
computer - the particular axiomatic system you choose
is like the set of primitive instructions of the computer
you choose. Not very serious, since for general computers
and general axiomatic systems there are invariance theorems:
changing the baggage often does not change a lot, so to speak.
But it should be mentioned.

2. p 11: you say that data sampled from Gaussian random variables
is incompressible - NOT true - give short codes to probable events
(close to the mean), long codes to rare events (Huffman
coding).

3. same sentence: how to test what inflation predicts?
How to test whether the big bang seed was really random,
not pseudo-random? The second million bits of pi look
random but are not. We should search for short programs
compressing the apparent randomness:
http://www.idsia.ch/~juergen/randomness.html

4. p 15: Mathematical structure (MS)
  just exists. Is that so? Others will look at
your symbols and say they are  just heaps of chalk
on a blackboard, and you need a complex,
wet pattern recognition system to interpret them.
Here's where beliefs enter...

5. p 18: mathematical structures, formal systems and
computations are aspects of one underlying
transcendent structure whose nature we don't fully understand
But we do! I'd say there are NO serious open problems with
your figure 5 - formal systems vs math vs computation
is a well-explored field. More about this below.
The 2000 paper (your nr 17) exploits
this understanding; it turns out the most convenient way to deal
with the measure problem is the computer science way (right
hand corner of your figure 5).
As I wrote in the 2000 paper:
http://arxiv.org/abs/quant-ph/0011122
The algorithmic approach, however, offers several conceptual 
advantages: (1) It provides the appropriate framework for issues of 
information-theoretic complexity traditionally ignored in pure 
mathematics, and imposes natural complexity-based orderings on the 
possible universes and subsets thereof. (2) It taps into a rich source 
of theoretical insights on computable probability distributions 
relevant for establishing priors on possible universes. Such priors are 
needed for making probabilistic predictions concerning our own 
particular universe. Although Tegmark suggests that ``... all 
mathematical structures are a priori given equal statistical weight'' 
[#!Tegmark:98!#](p. 27), there is no way of assigning equal 
nonvanishing probability to all (infinitely many) mathematical 
structures. Hence we really need something like the complexity-based 
weightings discussed in [#!Schmidhuber:97brauer!#] and especially the 
paper at hand. (3) The algorithmic approach is the obvious framework 
for questions of temporal complexity such as those discussed in this 
paper, e.g., ``what is the most efficient way of simulating all 
universes?''

6. Serious: run the sim, or just describe its program?
Are you sure you know what you want to say here?
What's the precise difference between program bitstrings
and output bitstrings? The bitstrings generated by the programs (the
descriptions) are just alternative descriptions of
the universes, possibly less compact ones. You as
an external observer may need yet another program
that translates the output bits (typically a less compressed
description) into video or something, to obtain the
description your eyes want.
Note that the 2000 paper and the 2002
journal variant don't really care for time
evolution, just for descriptions - within
the bitstrings maybe there is an observer
who thinks he knows what's time, but
to the outsider his concept of
time may be irrelevant. (Unlike the
1997 paper, the 2000/2002 papers do not focus on a
one to one mapping between physical
and computational time steps, otherwise we'd
miss all the universes where the concept
of time is irrelevant.) Here's what I wrote at  the end:
After all, algorithmic theories of the describable do
encompass everything we will ever be able to talk
and write about. Other things are simply beyond description.

7. Serious:  p 18 CUH: what's your def
of computable? You mean by 

Zuse Symposium: Is the universe a computer? Berlin Nov 6-7

2006-11-02 Thread Juergen Schmidhuber

Dear colleagues,
many interesting talks
at the Zuse Symposium:
Is the universe a computer?
Berlin Nov 6-7, 2006
http://www.dtmb.de/Aktuelles/Aktionen/Informatikjahr-Zuse/

Best regards,
-JS
http://www.idsia.ch/~juergen/computeruniverse.html


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Everything List group.
To post to this group, send email to everything-list@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/everything-list
-~--~~~~--~~--~--~---



probabilities measures computable universes

2004-01-23 Thread Juergen Schmidhuber
I browsed through recent postings and hope
this delayed but self-contained message can clarify
a few things about probabilities and measures
and predictability etc.
What is the probability of an integer being, say,
a square? This question does not make sense without
a prior probability distribution on the integers.
This prior cannot be uniform. Try to find one!
Under _any_ distribution some integers must be
more likely than others.
Which prior is good?  Is there a `best' or
`universal' prior? Yes, there is. It assigns to
each integer n as much probability as any other
computable prior, save for a constant factor that
does not depend on n.  (A computable prior can be
encoded as a program that takes n as input and
outputs n's probability, e.g., a program that
implements Bernoulli's formula, etc.)
Given a set of priors, a universal prior is
essentially a weighted sum of all priors in the
set. For example, Solomonoff's famous weighted sum
of all enumerable priors will assign at least as
much probability to any square integer as any
other computable prior, save for a constant
machine-dependent factor that becomes less and
less relevant as the integers get larger and
larger.
Now let us talk about all computable universe
histories. Some are finite, some infinite. Each
has at least one program that computes it. Again
there is _no_ way of assigning equal probability
to all of them! Many are tempted to assume a
uniform distribution without thinking much about
it, but there is _no_ such thing as a uniform
distribution on all computable universes, or on
all axiomatic mathematical structures, or on
all logically possible worlds, etc!
(Side note: There only is a uniform _measure_ on
the finitely many possible history _beginnings_
of a given size, each standing for an uncountable
_set_ of possible futures. Probabilities
refer to single objects, measures to sets.)
It turns out that we can easily build universal
priors using Levin's important concept of self-
delimiting programs. Such programs may
occasionally execute the instruction request new
input bit; the bit is chosen randomly and will
remain fixed thereafter. Then the probability of
some universe history is the probability of guessing
a program for it. This probability is `universal'
as it does not depend much on the computer (whose
negligible influence can be buried in a constant
universe-independent factor). Some programs halt or
go in an infinite loop without ever requesting
additional input bits. Universes with at least one
such short self-delimiting program are more probable
than others.
To make predictions about some universe, say,
ours, we need a prior as well. For instance,
most people would predict that next Tuesday it
won't rain on the moon, although there are
computable universes where it does. The anthropic
principle is an _insufficient_ prior that does not
explain the absence of rain on the moon - it does
assign cumulative probability 1.0 to the set of all
universes where we exist, and 0.0 to all the other
universes, but humans could still exist if it did
rain on the moon occasionally. Still, many tend to
consider the probability of such universes as small,
which actually says something about their prior.
We do not know yet the true prior from which
our universe is sampled - Schroedinger's wave
function may be an approximation thereof. But it
turns out that if the true prior is computable
at all, then we can in principle already predict
near-optimally, using the universal prior instead:
http://www.idsia.ch/~juergen/unilearn.html
Many really smart physicists do not know this
yet. Technical issues and limits of computable
universes are discussed in papers available at:
http://www.idsia.ch/~juergen/computeruniverse.html
Even stronger predictions using a prior based
on the fastest programs (not the shortest):
http://www.idsia.ch/~juergen/speedprior.html
-Juergen Schmidhuber



everything talk - new AI

2003-02-27 Thread Juergen Schmidhuber
I put everything talk slides on the web:
http://www.idsia.ch/~juergen/everythingtalk/
http://www.idsia.ch/~juergen/computeruniverse.html
And also a recent overview article:

The New AI: General  Sound  Relevant for Physics
J. Schmidhuber. TR IDSIA-04-03, arxiv:cs.AI/0302012
To appear in B. Goertzel and C. Pennachin, eds.:
Artificial General Intelligence
http://www.idsia.ch/~juergen/newai/newai.html
Abstract. Most traditional artificial intelligence (AI)
systems of the past 50 years are either very limited, or
based on heuristics, or both. The new millennium, however,
has brought substantial progress in the field of theoretically
optimal and practically feasible algorithms for prediction,
search, inductive inference based on Occam's razor, problem
solving, decision making, and reinforcement learning in
environments of a very general type. Since inductive inference
is at the heart of all inductive sciences, some of the results
are relevant not only for AI and computer science but also for
physics, provoking nontraditional predictions based on Zuse's
thesis of the computer-generated universe.
Comments welcome!

Juergen Schmidhuberhttp://www.idsia.ch/~juergen/



Re: Zuse's thesis web site

2002-11-06 Thread Juergen Schmidhuber
Thanks for your comments, Hal!

Your point on the dovetailer is well taken. I modified the text:
If Zuse's thesis is correct, then which is our universe's program? It 
may come as a surprise to some that we already know at least one 
algorithm that does compute our universe (and several others)...

Ed Clark has a nice review page on Wolfram's book:
http://www.math.usf.edu/~eclark/ANKOS_reviews.html
It includes Scott Aaronson's interesting review which also
addresses the issue of Bell's inequality.

Best,
Juergen   http://www.idsia.ch/~juergen/digitalphysics.html


Juergen Schmidhuber writes:


I welcome feedback on a little web page on Zuse's 1967 thesis
(which states that the universe is being computed on a cellular automaton):

http://www.idsia.ch/~juergen/digitalphysics.html



That's very interesting; I was not aware of Zuse.  Unfortunately I
don't know German so I can't read his paper.

Regarding the question of the compatibility of CA models with relativity
and QM, Wolfram looks into this in some detail.  He essentially abandons
a simple CA model in favor of a more complex network of interacting
nodes, which has some features similar to the Lorentz transformation of
relativity.  Then to address the EPR style long-distance correlations of
QM, he proposes that while the network is mostly local, it has occasional
nodes which get stretched apart and are connected to distant nodes.
These are rare but allow for the type of information flow necessary to
reproduce long-distance QM correlations.  All in all it is a pretty ad
hoc and unconvincing model.

I tried to read the t'Hooft paper referenced here but it was over my
head.  It also struck me though as not really addressing the discrepancy
between long-distance correlations and local CA models.  It seems very
much an open and difficult question to me to show how a local CA model
can reproduce relativity and QM.

One issue which CA models tend to ignore is the MWI.  Most CA models
are built as hidden variable theories which define a single universe.
Some multiverse models have that structure as well.  But it seems to me
that this is an entirely unnecessary restriction.  If a CA can model
a universe, it can model a multiverse, and likewise with any other
computing model like TMs.

The MWI is fully deterministic, which may make it a more attractive
target for modelling with a deterministic computational theory than
attempting to reproduce the statistical phenomena of QM, essentially
via hidden variables.  Any hidden variable theory, CA based or not,
has two strikes against it from the beginning due to the the many well
known difficulties of Bell inequalities and EPR correlations.

Regarding entropy, it is pointed out that entropy does not grow in a
CA model.  Wolfram discusses this as well.  While entropy technically
does not grow, you can get phenomena that look very much like entropy
growth in a CA model.  Eventually you will get a Poincare recurrence
if the universe is finite.  But if you start in a sufficiently simple
state, there are many CA models which will mimic entropy growth into a
more complex state.  And this may be close enough to explain our universe.

Alternatively, of course the MWI as a deterministic theory also does
not have entropy growth.  As mentioned above, computational models of
our universe might well do better to aim towards an MWI world.

As far as the claim that we already know the algorithm that runs our
universe, and it is the UD: I think this is amusing but ultimately
misleading.  It's true that a dovetailer which runs all programs will
indeed run our own universe's program (assuming it has one), but I think
it is a misuse of terminology to say that the UD is the algorithm that
is running our universe.  I would reserve that phrase to refer to the
specific program that generates our universe and no others.  It will be a
tremendous accomplishment of physics and philosophy when that program is
discovered, but it is misleading to give the impression that we already
know what it is.

I think a better terminology here would be something like, we don't
need to know the specific program that describes our universe in order
to imagine how to program a computer that would in fact generate our
experiences, at least in theory.  And then go on and explain about
running all programs at once, etc.

Hal Finney







Zuse's thesis web site

2002-11-05 Thread Juergen Schmidhuber
I welcome feedback on a little web page on Zuse's 1967 thesis
(which states that the universe is being computed on a cellular automaton):

http://www.idsia.ch/~juergen/digitalphysics.html

Juergen Schmidhuber




Re: wheeler walked away; Zuse ...

2002-09-27 Thread Juergen Schmidhuber

[EMAIL PROTECTED] wrote:
 as for your point in your post about wheeler attaching
 his name to the theory, I think its ok for proponents
 and not originators of a theory to be named along with it.
 for example lately Ive been referring to the
 fredkin-wolfram thesis. fredkin is far more the 
 originator; wolfram is far more the proponent. seems
 to me the everett-wheeler theory can be fairly seen in the
 same way.

just to set the record straight: the originator was Zuse.
Here is a link to a PDF scan of his 1967 paper (on the
universe running on a cellular automaton):
http://www.idsia.ch/~juergen/zuse.html




colt

2002-08-02 Thread Juergen Schmidhuber

Physics is an inductive science. You try to find a law that matches
the data, and hope that it generalizes on unseen data.
 
When asked about the nature of their inductive business, most
physicists refer to Popper's ideas from the 1930s (falsifiability
etc), and sometimes to Occam's razor: simple explanations are better
explanations.
 
Most physicists, however, are not even aware of the fact that there
is a formal basis for their inductive science, provided by the field
of computational learning theory (COLT), in particular, the theory of
universal induction.
 
The contents of the following COLT paper may be old news to some
on this list.
 
--
 
 
The Speed Prior: a new simplicity measure yielding near-optimal
computable predictions (Juergen Schmidhuber, IDSIA)

In J. Kivinen and R. H. Sloan, eds, Proc. 15th Annual Conf. on
Computational Learning Theory (COLT), 216-228, Springer, 2002;
based on section 6 of http://arXiv.org/abs/quant-ph/0011122 (2000)
http://www.idsia.ch/~juergen/speedprior.html
ftp://ftp.idsia.ch/pub/juergen/colt.ps

 
Solomonoff's optimal but noncomputable method for inductive
inference assumes that observation sequences x are drawn from an
recursive prior distribution mu(x). Instead of using the unknown
mu(x) he predicts using the celebrated universal enumerable prior
M(x) which for all x exceeds any recursive mu(x), save for a
constant factor independent of x. The simplicity measure M(x)
naturally implements Occam's razor and is closely related to the
Kolmogorov complexity of x. However, M assigns high probability to
certain data x that are extremely hard to compute. This does not match
our intuitive notion of simplicity. Here we suggest a more plausible
measure derived from the fastest way of computing data.  In absence
of contrarian evidence, we assume that the physical world is generated
by a computational process, and that any possibly infinite sequence of
observations is therefore computable in the limit (this assumption is
more radical and stronger than Solomonoff's). Then we replace M by the
novel Speed Prior S, under which the cumulative a priori probability
of all data whose computation through an optimal algorithm requires
more than O(n) resources is 1/n.  We show that the Speed Prior allows
for deriving a computable strategy for optimal prediction of future
y, given past x.  Then we consider the case that the data actually
stem from a nonoptimal, unknown computational process, and use
Hutter's recent results to derive excellent expected loss bounds for
S-based inductive inference.  Assuming our own universe is sampled
from S, we predict: it won't get many times older than it is now;
large scale quantum computation won't work well; beta decay is not
random but due to some fast pseudo-random generator which we should
try to discover.
 
Juergen Schmidhuberhttp://www.idsia.ch/~juergen




the short program

2002-07-10 Thread Juergen Schmidhuber

Tim May wrote:
 One thing that Tegmark got right, I think, is the notion that a lot of
 branches of mathematics and a lot of mathematical structures probably go
 into making up the nature of reality.

 This is at first glance dramatically apposite the ideas of Fredkin,
 Toffoli, Wheeler1970, and Wolfram on the generation of reality from
 simple, local rules. Wolfram has made a claim in interviews, and perhaps
 somewhere in his new book, that he thinks the Universe may be generated
 by a 6-line Mathematica program!
 
I'd like to point out that it was Konrad Zuse himself (inventor of the
computer: http://www.idsia.ch/~juergen/zuse.html ) who was the first to
propose that the physical universe is running on a grid of simple
computers, each communicating with its neighbors: a cellular automaton.
He called this Rechnender Raum, which means Computing Space. As
always, Zuse was way ahead of his time. Perhaps the best reference is:
 
  Zuse, Konrad: Rechnender Raum, Schriften zur Datenverarbeitung,
   Band 1.  Friedrich Vieweg  Sohn, Braunschweig (1969).
 
And here is the simple method for computing all universes, including
ours, if it is computable indeed (adapted from page 1 of the 97 paper
http://www.idsia.ch/~juergen/everything/html.html ):
 
  Order all programs lexicographically. The first
  is run for one instruction every second step, the next for one
  instruction every second of the remaining steps, and so on.
 
As a by-product, this program also outputs descriptions of all formally
describable mathematical systems, and all proofs of all their theorems.
 
A bit of thought shows that the method even has the optimal order of
complexity. For example, it outputs our universe history as quickly as
the history's fastest program, save for a (possibly huge) constant that
does not depend on output size.
 
Juergen Schmidhuber  http://www.idsia.ch/~juergen/




Re: Optimal Prediction

2002-04-15 Thread Juergen Schmidhuber

Wei Dai wrote:
 BTW, isn't the justification for universal prediction taken in this paper
 kind of opposite to the one you took? The abstract says The problem,
 however, is that in many cases one does not even have a reasonable guess
 of the true distribution. In order to overcome this problem ... Your
 papers on the other hand assume that the true distribution can be known
 and proposed that it must be the Speed Prior. (Later you said you believe
 it is the Speed Prior with probability 1.) Is this still your position?

Well, I prefer to recall my more cautious fallback position - let us
just
write down the assumptions, and derive the consequences.

One does not really have to know the true prior; one just needs an upper
bound on its power.  Choose some prior P, and assume the universe is
sampled from a prior less dominant than P. Then predictions according to
P will be rather accurate.

For example, suppose the process computing the universe is not optimally
efficient for some reason.  As long as the resource postulate holds the
true prior cannot dominate the Speed Prior, and S-based predictions
will be fine.

Juergen




Re: Optimal Prediction

2002-04-04 Thread Juergen Schmidhuber

Bill Jefferys wrote:

 At 10:59 AM +0200 4/3/02, Juergen Schmidhuber wrote:
 The theory of inductive inference is Bayesian, of course.
 But Bayes' rule by itself does not yield Occam's razor.

 By itself? No one said it did. Of course assumptions must be made.
 At minimum one always has to choose priors in Bayesian inference.

 Our paper shows that there is a Bayesian interpretation that yields
 something very suggestive of Ockham's razor. It is appealing in that
 if one has a simple versus a complex hypothesis, simple meaning
 that the prior probability is concentrated and complex meaning that
 it is vague and spread out, simple meaning that you don't have many
 knobs to tweak, complex meaning the opposite, then the simple
 hypothesis will be favored over the complex one unless the data lie
 well away from where the simple hypothesis has placed its bets.
 Bayesians distinguish this from Ockham's formulation by calling it
 the Bayesian Ockham's razor, recognizing that it is not what
 William of Ockham wrote, Entia non sunt multiplicanda sine
 necessitate (or one of his other genuine formulations).

 Please don't read more into our article than is there.

 By itself. First you said that the AP by itself has no predictive
 power. I missed the by itself so misunderstood you, but when I
 understood what you were saying I agreed. Now you say that Bayes'
 rule by itself does not yield Ockham's razor. Jim and I never said
 that it did. I am hard pressed to see how anything nontrivial
 relating to the real world can be gotten from any principle by
 itself, so I don't regard these comments as very profound, or very
 useful.

 [Remainder of article snipped]

 Bill
 
 
One has to choose priors. Exactly. To repeat the nontrivial point:
The only choice you need to obtain Occam's razor is to restrict
yourself to priors computable in the limit (this is not much of a
restriction in the sense that other priors are beyond constructive
description anyway). Then you'll automatically prefer few knobs to
tweak, or, more generally, short descriptions of the
observations:  http://www.idsia.ch/~juergen/toesv2/
 
Juergen  (will be out of town until April 16)




Re: Optimal Prediction

2002-03-28 Thread Juergen Schmidhuber

Bill Jefferys wrote:

 At 9:19 AM +0100 3/27/02, Juergen Schmidhuber wrote:
 You are claiming the AP necessarily implies a specific fact about
 nuclear energy levels? I greatly doubt that - can you give a proof?
 
 Yes, I can.
 
 
http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1954ApJS1..121Hdb_key=ASThigh=06070
 
 This is the classic paper where Hoyle shows this. It has been cited
 repeatedly in the literature on the AP as an example of a genuine
 prediction of the AP. For example, Barrow and Tipler cite it in their
 book.

Being sceptical about proof by citation, I failed to find such a  
mathematical proof in this paper. Its predictions are based on 
approximations and assumptions (the authors do acknowlege this) 
besides the AP.  Where exactly do you think is the proof?

Predictions that later turn out to be correct demonstrate predictive 
power of the sum of all assumptions, not of AP by itself.

 I don't care about flying rabbits. Stick to the issue at hand. 

This was the issue in the original message.

At the risk of beating a dead horse: plain AP does not exclude 
continuations of your world with different physical laws, as long as 
your self survives. Who knows - maybe you could live on in a crude 
simulation that completes ignores the concept of energy levels.

To make nontrivial predictions you clearly need more than the AP.
The theory of inductive inference provides what is missing:  
http://www.idsia.ch/~juergen/unilearn.html

Juergen




Re: Optimal Prediction

2002-03-28 Thread Juergen Schmidhuber


Bill Jefferys wrote:

 It's pointless wasting my time on this. As both Russell and I pointed
 out, this is a standard example that is cited by people who are
 knowledgeable about the AP. Either you have a different definition of
 predictive power than the rest of us do, or you don't understand
 Hoyle's very clearly written paper. In either case, it would be
 foolish of me to continue the thread.

 Goodbye. plonk

 Bill

Don't try to teach a pig to sing. It wastes your time and annoys the pig.


Predictive power is measurable by standard concepts of probability
theory
and complexity theory.  You may choose to ignore this, but don't include
all those who don't among the rest of us.

Write down all assumptions, derive the consequences, and observe that
the
AP _by itself_ cannot predict anything nontrivial.

Fortunately Hoyle was more careful than some who cite him - he just 
wrote: the results...were obtained subject to certain assumptions
...have not been demonstrated in a manner free from all doubt.
Nevertheless, the number of assumptions made was much less 
than the number of results obtained.

Thus he informally invoked Occam's razor: find short descriptions that
explain a lot. Occam's razor is not the AP.  It is formally treated by  
the theory of inductive inference. Although this theory is at the heart  
of what physicists are doing, some of them are not yet fully aware of
it.

Juergen(will be out of town for a while)




Optimal Prediction

2002-03-26 Thread Juergen Schmidhuber

Many on this list have discussed the anthropic principle. It
essentially says that the conditional probability of finding
yourself in a universe compatible with your existence equals 1.

But unfortunately the anthropic principle does not have any
predictive power. It does NOT predict there won't be any flying
rabbits tomorrow.

Clearly we need more than the anthropic principle to explain
the predictability of our universe. In particular, we need an 
optimal way of predicting the future, given past observations. 

And there is one!

Normally we do not know the true conditional probability 
distribution p(next event | past). But assume we do know that 
p is in some set P of distributions. Choose a fixed weight w_q 
for each q in P such that the w_q add up to 1 (for simplicity, 
let P be countable). Then construct the Bayesmix 
M(x) = Sum_q w_q q(x), and predict using M instead of the
optimal but unknown p.

How wrong is it to do that? The recent exciting work of my postdoc
Marcus Hutter (IDSIA) provides general and sharp (!) loss bounds:

Let LM(n) and Lp(n) be the total expected losses of the M-predictor 
and the p-predictor, respectively, for the first n events. Then 
LM(n)-Lp(n) is at most of the order of sqrt[Lp(n)]. That is, M is 
not much worse than p. And in general, no other predictor can do 
better than that!

In particular, if p is deterministic, then the M-predictor soon 
won't make any errors any more.

If P contains ALL recursively computable distributions, then M 
becomes the celebrated enumerable universal prior. That is, after 
decades of somewhat stagnating research we now have sharp loss 
bounds for Solomonoff's universal (but incomputable) induction 
scheme. And if we also include the distributions computable in 
the limit, we get sharp loss bounds for the even more general 
priors introduced in Algorithmic Theories of Everything: 
http://www.idsia.ch/~juergen/toesv2/

Similarly, if we replace M by the Speed Prior S - where S(x) is 
small if x is hard to compute by any method - we obtain appropriate 
loss bounds for computable S-based induction:
http://www.idsia.ch/~juergen/toesv2/node32.html

Alternatively, reduce M to what you get if you just add up 
weighted estimated future finance data probabilities generated by 
1000 commercial stock-market prediction software packages. If only 
one of them happens to work fine (but you do not know which) you 
still should get rich.

To learn more, please read

Optimality of Universal Bayesian Sequence Prediction for General Loss  
and Alphabet:ftp://ftp.idsia.ch/pub/techrep/IDSIA-02-02.ps.gz

and also check out Hutter's other recent papers at ICML, ECML, NIPS,
Int. J. of Foundations of CS: www.idsia.ch/~marcus



Juergen Schmidhuber
http://www.idsia.ch/~juergen/




Re: Does provability matter?

2001-12-19 Thread Juergen Schmidhuber


Wei Dai wrote:
 I don't understand how you can believe that the probability of more
 dominant priors is zero. That implies if I offered you a bet of $1
 versus your entire net worth that large scale quantum computation will
 in fact work, you'd take that bet. Would you really?

Your dollar against my 2 cents worth? I don't want to rip you off!

Ok, let me adopt a more careful Bayesian's fallback position: I just
assume certain things, then derive consequences, given the assumptions,
without precisely quantifying the plausibility of the assumptions 
themselves, letting the reader decide whether they are plausible or not.

Juergen




Re: Does provability matter?

2001-12-11 Thread Juergen Schmidhuber

Wei Dai wrote:
 I'm not sure I understand this. Can you give an example of how our
 universe might make use of an entire continuum of real numbers? How might
 someone show this if it were true?

I have no idea. In fact, I guess it is impossible.

 But if there is a formally describable prior that dominates the speed
 prior, and you agree that the more dominant prior doesn't have a prior
 probability of zero, then isn't the speed prior redundant? Wouldn't you
 get equal posterior probabilities (up to a constant multiple) by
 dropping the speed prior from your prior on priors, no matter what it
 assigns to priors that are not formally describable?

In the Bayesian framework we derive consequences of assumptions
represented as priors. The stronger the assumptions, the more specific
the
predictions. The Speed Prior assumption is stronger than the assumption
of a formally describable prior. It is not redundant because it yields
stronger
predictions such as: The computer computing our universe won't compute
much more of it; large scale quantum computation won't work; etc.

In fact, I do believe the Speed Prior dominates the true prior from
which our universe is sampled (which is all I need to make good
computable predictions), and that the probability of even more
dominant priors is zero indeed. But as a Bayesian I sometimes ignore
my beliefs and also derive consequences of more dominant priors.
I do find them quite interesting, and others who do not share my
belief in the Speed Prior might do so too.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Does provability matter?

2001-11-28 Thread Juergen Schmidhuber

Wei Dai wrote:

 On Thu, Nov 15, 2001 at 10:35:58AM +0100, Juergen Schmidhuber wrote:
   Why do you prefer the Speed Prior? Under the Speed Prior, oracle universes
   are not just very unlikely, they have probability 0, right? Suppose one
   day we actually find an oracle for the halting problem, or even just find
   out that there is more computing power in our universe than is needed to
   explain our intelligence. Would you then (1) give up the Speed Prior and
   adopt a more dominant prior, or (2) would you say that you've encountered
   an extremely unlikely event (i.e. more likely you're hallucinating)?
  
   If you answer (1) then why not adopt the more dominant prior now?
 
  Why not adopt a more dominant prior now? I just go for the simplest
  explanation consistent with available data, where my simplicity measure
  reflects what computer scientists find simple: things that are easy
  to compute.

 You didn't explicitly answer my question about what you would do if you
 saw something that is very unlikely under the Speed Prior but likely under
 more dominant priors, but it seems that your implicit answer is (1). In
 that case you must have some kind of prior (in the Baysian sense) for the
 Speed Prior vs. more dominant priors. So where does that prior come from?

 Or let me put it this way. What do you think is the probability that the
 Great Programmer has no computational resource constraints? If you don't
 say the probability is zero, then what you should adopt is a weighted
 average of the Speed Prior and a more dominant prior, but that average
 itself is a more dominant prior. Do you agree?

If someone can use large scale quantum computing to solve a problem
whose
solution requires much more computation time than necessary to compute
the history of our own particular universe, then one could take this
as evidence against the Speed Prior.  You are right: to quantify this
evidence in a Bayesian framework we need a prior on priors.

Which one? Hm. Let me extend your question and ask: what's the
probability
that the Great Programmer is more than a mere programmer in the sense
that he is not bound by the limits of computability?  For instance,
if someone were able to show that our universe somehow makes use of an
entire continuum of real numbers we'd be forced to accept some even more
dominant prior that is not even computable in the limit.  We could not
even formally specify it.

So what's my prior on all priors? Since the attempt to answer such a
question might lead outside what's formally describable, I'll remain 
silent for now.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Does provability matter?

2001-11-15 Thread Juergen Schmidhuber

Wei Dai wrote:

 Thanks for clarifying the provability issue. I think I understand and
 agree with you.

 On Tue, Nov 13, 2001 at 12:05:22PM +0100, Juergen Schmidhuber wrote:
  What about exploitation? Once you suspect you found the PRG you can use
  it
  to predict the future. Unfortunately the prediction will take enormous
  time to stabilize, and you never can be sure it's finished.
  So it's not very practical.

 By exploiting the fact that we're in an oracle universe I didn't mean
 using TMs to predict the oracle outputs. That is certainly impractical.

 There are a couple of things you could do though. One is to use some
 oracle outputs to predict other oracle outputs when the relationship
 between them is computable. The other, much more important, is to quickly
 solve arbitrarily hard computational problem using the oracles.

  I prefer the additional resource assumptions reflected
  by the Speed Prior.  They make the oracle universes very unlikely, and
  yield computable predictions.

 Why do you prefer the Speed Prior? Under the Speed Prior, oracle universes
 are not just very unlikely, they have probability 0, right? Suppose one
 day we actually find an oracle for the halting problem, or even just find
 out that there is more computing power in our universe than is needed to
 explain our intelligence. Would you then (1) give up the Speed Prior and
 adopt a more dominant prior, or (2) would you say that you've encountered
 an extremely unlikely event (i.e. more likely you're hallucinating)?

 If you answer (1) then why not adopt the more dominant prior now?

You are right in the sense that under the Speed Prior any complete
_infinite_ oracle universe has probability 0. On the other hand,
any _finite_ beginning of an oracle universe has nonvanishing
probability. Why? The fastest algorithm for computing all universes
computes _all_ finite beginnings of all universes. Now oracle universes
occasionally require effectively random bits.  History beginnings that
require n such bits come out very slowly: the computation of the n-th
oracle bit requires more than O(2^n) steps, even by the fastest
algorithm.
Therefore, under the Speed Prior the probabilities of oracle-based
prefixes quickly converge towards zero with growing prefix size. But in
the finite case there always will remain some tiny epsilon.

Why not adopt a more dominant prior now? I just go for the simplest
explanation consistent with available data, where my simplicity measure
reflects what computer scientists find simple: things that are easy
to compute.


Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Does provability matter?

2001-11-13 Thread Juergen Schmidhuber


Wei Dai wrote:

 On Wed, Oct 31, 2001 at 10:49:41AM +0100, Juergen Schmidhuber wrote:
  Which are the logically possible universes?  Tegmark mentioned a
  somewhat
  vaguely defined set of ``self-consistent mathematical structures,''
  implying provability of some sort. The postings of Marchal also focus
  on what's provable and what's not.

 Do you understand Marchal's postings? I really have trouble making sense
 of them. If you do, would you mind rephrasing his main ideas for me?

No, unfortunately I do not understand him. But provability does seem
important to him.  Maybe someone else could post a brief and concise
summary?

  Is provability really relevant?  Philosophers and physicists find it
  sexy for its Goedelian limits. But what does this have to do with the
  set of possible universes?

 What does it mean for a universe to be provable? One interpretation is
 that there exists an algorithm to compute any finite time-space region of
 the universe that always halts. Is that what you mean?

Well, any finite object x has a halting algorithm that computes it.
So which are the unprovable things? They are formal statements such as
algorithm x will eventually compute object y. If it does then you can
prove it, otherwise in general you can never be sure. But why identify
universes with true or false statements? When I'm asking Is provability
really relevant? I am not really referring to provable universes -
I don't even know what that is supposed to be. I just mean: why should
it be important to show that a particular algorithm computes a
particular
universe? Or that a particular universe has a particular property (there
are many generally unprovable properties)?  Let's compute them all -
why worry about provability?

  The provability discussion seems to distract from the real issue. If we
  limit ourselves to universes corresponding to traditionally provable
  theorems then we will miss out on many formally and constructively
  describable universes that are computable in the limit yet in a certain
  sense soaked with unprovability.
 
  Example: a never ending universe history h is computed by a finite
  nonhalting program p. To simulate randomness and noise etc, p invokes
  a short pseudorandom generator subroutine q which also never halts. The
  n-th pseudorandom event of history h is based on q's  n-th output bit
  q(n)
  which is initialized by 0 and set to 1 as soon as the n-th statement in
  an ordered list of all possible statements of a formal axiomatic system
  is proven by a theorem prover that systematically computes all provable
  theorems.  Whenever q modifies some q(n) that was already used in the
  previous computation of h, p appropriately recomputes h since the n-th
  pseudorandom event.

 Living in this universe is like living in a universe with a
 halting-problem oracle, right? If you want to know whether program x
 halts, you just measure the n-th pseudorandom event, where n is such that
 the n-th theorem is x halts. Yes?

Yes. Once the history has stabilized this will give you the correct
answer.

 The problem with universes with halting-problem oracles (which I'll just
 call oracle universes) is that either the oracle is not exploitable
 (for example if no one in the universe figures out that the pseudorandom
 events actually correspond to theorems) in which case the universe just
 looks like an ordinary provable universe, or the oracle is
 exploitable, in which case the outcome becomes completely unimaginable for
 us. (Or is that not true? Can we imagine what would happen if an
 intelligent species came across an exploitable halting-problem oracle?)

  But why should this lack of provability matter?  Ignoring this universe
  just implies loss of generality.  Provability is not the issue.

 If oracle universes could exist, how significant are they? What's
 the probability that we live in such a universe, and how could we try to
 determine and/or exploit this fact?

If the computer on which the history of our universe is calculated is a
general Turing machine (as opposed to a monotone machine which cannot
edit
previous outputs), and if we make no resource assumptions such as those
embodied by the Speed Prior, then the oracle universes indeed represent
a large fraction of the probability mass of all universes. Why? Because
there are many oracle universes with very short algorithms.

How to determine whether we are in an oracle universe? In principle it
is possible, at least to the extent inductive inference is possible at
all! Just search through all oracle pseudo-random generators (PRGs) and
try to match their outputs with the apparently noisy observations. If
the observations are not really random, eventually you'll stumble across
the right PRG. It's just that you cannot prove that it is the right one,
or that its outputs have already stabilized.

What about exploitation? Once you suspect you found the PRG you can use
it
to predict the future. Unfortunately the prediction

Does provability matter?

2001-10-31 Thread Juergen Schmidhuber

Cannibalizing previous thread Provable vs Computable:

Which are the logically possible universes?  Tegmark mentioned a
somewhat
vaguely defined set of ``self-consistent mathematical structures,''
implying provability of some sort. The postings of Marchal also focus
on what's provable and what's not.

Is provability really relevant?  Philosophers and physicists find it
sexy for its Goedelian limits. But what does this have to do with the
set of possible universes?

The provability discussion seems to distract from the real issue. If we
limit ourselves to universes corresponding to traditionally provable
theorems then we will miss out on many formally and constructively
describable universes that are computable in the limit yet in a certain
sense soaked with unprovability.

Example: a never ending universe history h is computed by a finite
nonhalting program p. To simulate randomness and noise etc, p invokes
a short pseudorandom generator subroutine q which also never halts. The
n-th pseudorandom event of history h is based on q's  n-th output bit
q(n)
which is initialized by 0 and set to 1 as soon as the n-th statement in
an ordered list of all possible statements of a formal axiomatic system
is proven by a theorem prover that systematically computes all provable
theorems.  Whenever q modifies some q(n) that was already used in the
previous computation of h, p appropriately recomputes h since the n-th
pseudorandom event.

Such a virtual reality or universe is perfectly well-defined.  We can
program it. At some point each history prefix will remain stable
forever.
Even if we know p and q, however, in general we will never know for sure
whether some q(n) that is still zero won't flip to 1 at some point,
because of Goedel etc.  So this universe features lots of unprovable
aspects. 

But why should this lack of provability matter?  Ignoring this universe
just implies loss of generality.  Provability is not the issue.

Juergen Schmidhuber   

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/




Re: Predictions duplications

2001-10-29 Thread Juergen Schmidhuber

   From: Russell Standish [EMAIL PROTECTED]
   The only reason for not accepting the simplest thing is if it can be
   shown to be logically inconsistent. This far, you have shown no such
   thing, but rather demonstrated an enormous confusion between measure
   and probability distribution.
 
  Russell, this is really too much; please read any intro to measure
  theory,
  e.g., the one by Halmos.  Measures in general are _not_ probability
  distributions.  They are more than that; you can view them as _sets_ of
  probability distributions on sets that are related to each other in a
  specific way. Probability density functions define measures and
  therefore
  in general aren't probability distributions either. The uniform PDF on
  [0..1] yields the perfect trivial example; that's exactly why I used it
  before.  In the computability context, compare LI/Vitanyi 1997, chapters
  4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
  (continuous sample space with (semi)measure M(x)).

 This is an excellent reference, thank you. Example 4.2.1 gives the
 construction of the uniform measure over the set of strings, precisely
 what you claim didn't exist when we started this discussion.

You just want to annoy me, don't you?  For the last time: There is a
uniform *measure* on the strings, but no uniform probability
distribution.

   1) Halting theorem implies the set of halting programs is of measure
   zero within the set of all programs
 
  You mean, given a uniform measure? But why should you need the halting
  theorem for this trivial statement?  In fact, what exactly do you mean
  by
  halting theorem?  (Schoenfield's book provides a good intro to
  mathematical
  logic and computability theory.)
 

 The halting theorem is that there is no algorithm for predicting
 whether any program will halt. Of course, the result I was referring
 to is that the set of compressible strings is of measure zero. The set
 of halting programs actually has finite measure, since \Omega 
 0. However, there is still finite probability of an input string
 being nonhalting, and indeed the probability seems to be greater than
 0.7 (ie \Omega 0.217643 - see pg 218 Li and Vitanyi), so this still
 presents a problem for computationalism.

This is mixing up everything, in particular, different measures.
1) Infinite strings with finite program have *nonzero* measure, given
the *universal* measure (Omega is a number derived from the universal
measure).  2) They have measure zero, given the *uniform* measure (Omega
has nothing to do with it).  3) To see 2) we do not need the halting
theorem - we just compare the finite programs (countably many) to all
strings (uncountably many) and we are done. 4) The halting theorem is
no problem whatsoever when it comes to computable universes.  Why care
whether some computable universe stops or not?  If it does, it does,
otherwise it does not. In both cases we can compute it in the limit.

   The GP program (aka Universal Dovetailer) is not the simplest
   thing. The simplest describable thing is the set of all descriptions,
   that simply exist without being generated. That has precisely zero
   information or complexity, whereas the UD has some complexity of the
   order of a few 100 bits. The simplest prior is the uniform one, ie
   there is no bias whatsoever in the selection.
 
  Exist without being generated?  Precisely zero information or
  complexity? But with respect to what?

 With respect to the observer. With the respect to anything in
 fact. The set of all possible descriptions has precisely zero
 information, by definition, regardless of what observer or context you
 employ.

This is a vague informal statement, not a definition. Which are the
possible descriptions? What's the formal difference between a
description
and a non-description?  What is your anything if there is no device
for describing it formally? If anything does not convey any bit of
info then what exactly is it that conveys one bit? Or two?

  Any traditional definition of
  information/simplicity requires the specification of a formal
  description
  method, such as a Turing machine. You don't need one? Then how do you
  define information or complexity? How exactly do you define simple ?

 Actually, all it needs is an equivalencing procedure. See my On
 Complexity and Emergence paper. A Turing machine will do the job for
 you, but so will a neural network or an expert system following
 heuristic rules.

So you have to write down formally this equivalencing procedure, right?
Why should this be different in principle from writing down the formal
rules for a Turing machine? Why is a simplicity measure that depends on
your equivalencing procedure superior to a simplicity measure depending
on a Turing machine?  (The TM procedure includes yours - because it
includes all procedures - but not the other way round.)




Re: Predictions duplications

2001-10-26 Thread Juergen Schmidhuber

 From: Juho Pennanen [EMAIL PROTECTED]
 So there may be no 'uniform probability distribution' on the set of all
 strings, but there is the natural probability measure, that is in many
 cases exactly as useful.

Sure, I agree, measures are useful; I'm using them all the time. But in
general they are _not_ the same thing as probability distributions.

 From: Russell Standish [EMAIL PROTECTED]
 The only reason for not accepting the simplest thing is if it can be
 shown to be logically inconsistent. This far, you have shown no such
 thing, but rather demonstrated an enormous confusion between measure
 and probability distribution.

Russell, this is really too much; please read any intro to measure
theory,
e.g., the one by Halmos.  Measures in general are _not_ probability
distributions.  They are more than that; you can view them as _sets_ of
probability distributions on sets that are related to each other in a
specific way. Probability density functions define measures and
therefore
in general aren't probability distributions either. The uniform PDF on
[0..1] yields the perfect trivial example; that's exactly why I used it
before.  In the computability context, compare LI/Vitanyi 1997, chapters
4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
(continuous sample space with (semi)measure M(x)).

 1) Halting theorem implies the set of halting programs is of measure
 zero within the set of all programs

You mean, given a uniform measure? But why should you need the halting
theorem for this trivial statement?  In fact, what exactly do you mean
by
halting theorem?  (Schoenfield's book provides a good intro to
mathematical
logic and computability theory.)

 The GP program (aka Universal Dovetailer) is not the simplest
 thing. The simplest describable thing is the set of all descriptions,
 that simply exist without being generated. That has precisely zero
 information or complexity, whereas the UD has some complexity of the
 order of a few 100 bits. The simplest prior is the uniform one, ie
 there is no bias whatsoever in the selection.

Exist without being generated?  Precisely zero information or
complexity? But with respect to what?  Any traditional definition of
information/simplicity requires the specification of a formal
description
method, such as a Turing machine. You don't need one? Then how do you
define information or complexity? How exactly do you define simple ?




Re: Predictions duplications

2001-10-26 Thread Juergen Schmidhuber

Schmidhuber:

It's the simplest thing, given this use of mathematical
language we have agreed upon. But here the power of the
formal approach ends - unspeakable things remain unspoken.

Marchal:

I disagree. I would even say that it is here that the serious formal
approach begins. Take unprovable for unspeakable, we can
meta-reason (informally or formally) and study the logical structure
of the set of unprovable sentences by sound universal machines.

Schmidhuber:

Unprovable does not mean unspeakable! 
You can spell out many unprovable things.

Why care for the subset of provable sentences? 
Aren't we interested in the full set of all describable sentences? 
We can generate it, without caring for proofs at all.

Unspeakable means beyond formal definition. In particular, the
way we humans communicate and agree on a common formal language is not
formally defined. You write x and I say, ok, it's an x, but that's
all based on some sort of informal agreement that involves complex
pattern recognition and learning processes. Usually we do not question
the validity of outcomes of such cognitive processes, and just go ahead
with formal derivations.  But there is this major informal step *before*
formal reasoning can start, and good textbooks on logic acknowledge
this.




Re: everything priors

1999-11-11 Thread Juergen Schmidhuber

Hi Max,

2) If so, should we really limit ourself to this particular kind
of mathematical structures? My concern is that we may be a bit too
narrow-minded if we do.

 But this sort of narrow-mindedness seems necessary to remain within the
 formally describable realm.  I'd go beyond computable structures only
 if forced by evidence, e.g., if someone shows our universe somehow won't
 run without all the real numbers. But for now there isn't any evidence
 in this vein.
Wait a sec: there's also no evidence that our particular
universe has seven spatial dimensions or a proton/electron
mass ratio different from 1836. But we're considering the whole
ensemble here.

Right. Our motivation is that the ensemble is compatible with existing
data yet simpler than the special case. But Occam's razor also motivates
us to consider just the computable universes as opposed to those based on,
say, non-computable real numbers. The ensemble of computable universes is
both compatible with known data AND simple. The ensemble of non-computable
universes is not!

Cheers,
Juergen




Re: Turing vs math

1999-11-04 Thread Juergen Schmidhuber

 Step n owns 2^(n-1) initial segments.

Bruno, why are we discussing this? Sure, in finite time you can compute
all initial segments of size n.  In countable time you can compute one
real, or a countable number of reals.  But each of your steps needs more
than twice the time required by the previous step.  Therefore you need 
more than countable time to compute all reals.

 Now, could you give me a bitstring which is not generated
 by this countably infinite process ?

Sure. The output of the program While TRUE print 1 won't
be computed in countable time by your procedure.

Juergen




Re: Turing vs math

1999-10-27 Thread Juergen Schmidhuber

 Why assume non-computable stuff without compelling reason?
 Shaved by Occam's razor.

Jacques:

On the contrary.  Why assume the lack of *any* given type of
mathematical stucture?  A true everything-hypothesis surely would not.
Occam's razor says: don't add extra distinctions such as a restriction
like that.

Note also that, as I said, computability isn't the real issue.  A
Turing machine can not be a continuous (but computable) structure.  Of
course the non-computable stuctures should exist too in an everything -
hypothesis.

The non-computable structures are just an ill-defined fidget of our
imagination. They do not exist in the sense that we cannot formally
describe them with a finite number of bits.  Textbooks and theorems about
real numbers are computable (finite symbol strings, finite proofs), most
real numbers are not.

Occam's razor really says: do not add any bits beyond those necessary
to explain the data. Observed data does not require more than a finite
number of bits, and never will. 

Non-computability is not a restriction. It is an unnecessary extension
that greatly complicates things, so much that we cannot even talk about 
it in a formal way.

Juergen Schmidhuber  www.idsia.ch




Re: Turing vs math

1999-10-26 Thread Juergen Schmidhuber

Jacques Mallah wrote:

  I agree with
 Alistair Malcolm that Turing machines are not enough
  A continuous structure is a perfectly good
 mathematical structure, but no Turing based scheme can include it.

Why assume non-computable stuff without compelling reason?

Shaved by Occam's razor.

Juergen



Juergen Schmidhuber www.idsia.ch




Re: Turing vs math

1999-10-22 Thread Juergen Schmidhuber

Bruno:

 Honestly it is still not clear. How could ever S(U)=TRUE be computable ?
 As a computer scientist I guess you know that even the apparently simple
 question does that piece of code computes the factorial function is not
 computable.

Sure, it's not even decidable in general whether a given program will
produce any output at all. Once the output is there, though, you can
easily decide whether it consists of, say, prime numbers. In absence of
evidence to the contrary we assume that the presence of your consciousness
(whatever that may be) is detectable by a computable process, namely, the
one employed by you, the observer, who decides that he exists, via some
sort of computation taking place as part of the evolution of his universe.

George:

 Juergens seems to be talking as if the measure of information is absolute. It
 is not. Information is always conditional information (Shannon), where the
 condition is the observer's a-priori database of information.

Sure, but in the Kolmogorov framework the role of prior knowlwdge vanishes
in the limit. Each prior corresponds to a particular Turing machine
TM with particular inbuilt prior knowledge. But TM2 can simulate TM1
using a compiler whose size is independent of the information content
of the universes they compute. For all U, K1(U)=O(K2(U)), where K1(U)
and K2(U) are U's information contents relative to T1 and T2. Similarly
P1(U)= O(P2(U)), where P1 and P2 are the respective probabilities given
Solomonoff-Levin distributions for T1 and T2. There is a good reason why
Solomonoff-Levin distributions are called universal distributions!

Hal:

 We then invoke the principle that large-program universes are inherently
 less likely than small-program universes, and presto! we have it more
 likely that we live in a universe without flying rabbits, without
 magic, etc.  That's the general argument we are striving to achieve.

I agree. That's precisely one of the major points of the 1997 paper.

 I do think that this argument has some problems, but it is appealing and
 if the holes can be filled it seems to offer an answer to the question.
 What do you think?

Where exactly are the holes?

Juergen




Re: Turing vs math

1999-10-21 Thread Juergen Schmidhuber

Bruno wrote:

 I don't take the notion of observer for granted.

Neither do I, of course. The observer O is something computable that
evolves in some universe U.

 The problem is that to be in a universe has no clear meaning

But it does.  There is a computable predicate S such that S(U)=TRUE  if
O inhabits U.  Fortunately, there is no need for us to further specify
what it means to ``inhabit,'' to be ``conscious'' or ``self-aware,'' or 
whether there is some other observer who applies S to U and uses S(U)
to identify O, etc.

Now we have a formally defined, conditional probability distribution
on all universes satisfying S.  I thought this to be clear, but maybe
I should have written it down explicitly.

Juergen




UTM vs math

1999-10-21 Thread Juergen Schmidhuber