From: Osher Doctorow [EMAIL PROTECTED], Fri. Sept. 6, 2002 11:45AM I have read about half of J. Schmidthuber's *A computer scientist's view of life, the universe, and everything,* (1997), and he has interesting ideas and clarity of presentation, but I have to disagree with him on a number of places where he uses conditional probability including his section Generalization and Learning. I hasten to add that I do not view alternative theories as *wrong* but as competing and that they should almost all survive for competition, motivation, and also because many of them turn out to have useful contributions long after they have been regarded as *discredited*.
Schmidthuber (S for short) concludes that generalization is impossible in general by using a proof based on conditional probability, and similarly he concludes that the learner's life in general is limited by also a conditional probability proof. Most readers will undoubtedly stare at this statement in bewilderment, since as far as they know nothing is wrong with conditional probability. They are partly correct and partly wrong. Nothing is wrong with conditional probability, which is the main tool of the Bayesian school (or as I abbreviate it, the BCP or Bayesian Conditional Probability-Statistics school), for Fairly Frequent Events. For Rare Events, something very strange happens. This was how my wife Marleen and I began our exploration of Rare Events in 1980. Conditional probability divides two probabilities and regards that as an indication of the probability of one event *given* another event, where *given* is used in the sense of *freezing the other event in place*. Some real analysis experts will argue that this is all justified by the Radon Derivative of the Lebesgue-Radon-Nikodym theorem(s), not quite realizing that the proof of those theorems only hold up to equivalence classes outside sets of measure ZERO. But events of probability zero are the Rarest Events. Moreover, division of probabilities blows up even in small (one-sided) neighborhoods of probability 0 since division by 0 is impossible. Thus, not only can conditional probability not model events of probability 0, but it cannot even model events of probability close to 0 (Rare Events). Is there a simple solution? Yes! Product/Goguen fuzzy multivalued logical implication x-->y is defined as y/x for x not 0. So it corresponds to conditional probability where x and y are carefully chosen probabilities in the probability-statistics analog. Lukaciewicz and Rational Pavelka fuzzy multivalued logical implications (Rational Pavelka is the predicate logic generalization of Lukaciewicz propositional logic) are x-->y = 1 + y - x for y < = x for the non-trivial case. The latter does not involve division by 0 and does not blow up in any (one-sided) neighborhood of zero. Logic-Based Probability (LBP) uses precisely the same definition of 1 + y - x in place of y/x for exactly the same probabilities x, y which BCP uses. My wife and I introduced LBP in 1980. It may be remarked here the Godel fuzzy multivalued logic, which we showed applies to Very Frequent (Very Common) Events, uses x-->y = y and refers in the probability-statistics analog to INDEPENDENT events, and since in general events are not independent unless that can be established in special cases, LBP is the correct result to use. So when S claims that generalization is impossible in general and that the learner's life is limited in general, he has to be referring to Fairly Frequent Events, not Rare Events or even Very Frequent Events (which use the Godel analog). But surely that leaves much room for S to maneuver in? In a way, yes, and in a way, no. S is very interested in the Great Programmer or even a decreasing sequence of Great Programmers each delegating authority to the other in different universes and so on. The Great Programmer thinks on the level of the Universe or All Universes or the particular Universe in the sequence. So we have to ask: which type of fuzzy multivalued logic or its probability-statistics analog (or proximity function - geometry - topology analog, which we developed as exact analogs of the above) most influences the Universe(s)? The answer turns out to be very simple, namely Lukaciewicz/Rational Pavelka (Rare Event) or its probability-statistics analog LBP. This is because in our universe it is generally agreed that a Rare Event called a Big Bang occurred (I have proven that even if it did not, as in Steinhardt-Turok and Gott-Li cyclic or backward time loop cosmological theories, LBP is the key influence probability), and that very rare events such as inflation and the transition from radiation-dominated to matter-dominated eras and transition from non-accelerating to accelerating universe which fairly recently occurred - that all of these Rare Events played critical roles in the development of the Universe. I should also mention that Shannon Information-Entropy and its Kolmogorov generalizations blow up near zero because the logarithm does, and that the only *influence* type of Shannon Information-Entropy is based on conditional probability, which of course also blows up at zero. Rare Event Information-Entropy does not use logarithms but (positive or negative) exponentials, and of course does not divide probabilities so it does not blow up at or near zero denominator. Quantum-field-theory-oriented physicists may be slightly disturbed at this point, since QFT totally eliminates probabilities except in the *formal* location of Schrodinger's equation which is regarded as a *deterministic* equation (another anomaly that I will be glad to argue about at another time or place). Happily or unhappily, they have no choice in the matter of the above results, since they hold across about 10 different branches of mathematics and almost an equal number of branches of physics. Curiously enough, Quantum Mechanics theorists manage to get probability back into the picture, including their much-used CONDITIONAL probability, while simultaneously disavowing the stochastic (probability) school and claiming allegiance to the Statistics School (apparently unaware that there is no statistics without probability) which plays an only *formal* role in supporting the *deterministic Schrodinger and Heisenberg* equations. Osher Doctorow