OK... life lesson
#567: When a mathematical explanation confuses non-math people, another
mathematical explanation is not likely to help
The basic situation
can be thought of as follows.
Suppose you have a
large set of people, say, all the people on Earth
Then you have a
bunch of categories you're interested in, say:
Chinese
Arab
fat
skinny
smelly
female
...
Then you have some
absolute probabilities, e.g.
P(Chinese) =
.2
P(fat) =
.15
etc. , which tell
you how likely a randomly chosen person is to fall into each of the
categories
Then you have some
conditional probabilities, e.g.
P(fat |
skinny)=0
P(smelly|male) =
.62
P(fat | American) =
.4
P(slow|fat) =
.7
The last one, for
instance, tells you that if you know someone is American, then there's a .4
chance the person is fat (i.e. 40% of Americans are fat).
The problem at hand
is, you're given some absolute and some conditional probabilities regarding the
concepts at hand, and you want to infer a bunch of others.
In localized cases
this is easy, for instance using probability theory one can get evidence
for
P(slow|American)
from the
combination of
P(slow|fat)
and
P(fat |
American)
Given n concepts
there are n^2 conditional probabilities to look at. The most interesting
ones to find are the ones for which
P(A|B) is very
different from P(B)
just as for
instance
P(fat|American) is
very different from P(fat)
This problem is
covered by elementary probability theory. Solving it in principle is no
issue. The tricky problem is solving it approximately, for a large number
of concepts and probabilities, in a very rapid computational
way.
Bayesian networks
try to solve the problem by seeking a set of concepts that are arranged in an
"independence hierarchy" (a directed acyclic graph with a concept at each node,
so that each concept is independent of its parents conditional on its ancestors
-- and no I don't feel like explaining that in nontechnical terms at the moment
;). But this can leave out a lot of information because real
conceptual networks may be grossly interdependent. Of course, then one can
try to learn a whole bunch of different Bayes nets and merge the probability
estimates obtained from each one....
One thing that complicates the
problem is that ,in some cases, as well as inferring probabilities one hasn't
been given, one may want to make corrections to probabilities one HAS been
given. For instance, sometimes one may be given inconsistent information,
and one has to choose which information to
accept.
For example, if you're told
P(male) = .5
P(young|male) = .4
P(young) = .1
then something's gotta give, because the first two
probabilities imply P(young) >= .5*.4 = .2
Novamente's probabilistic reasoning system handles this
problem pretty well, but one thing we're struggling with now is keeping this
"correction of errors in the premises" under control. If you let the
system revise its premises to correct errors (a necessity in an AGI context),
then it can easily get carried away in cycles of revising premises based on
conclusions, then revising conclusions based on the new premises, and so on in a
chaotic trajectory leading to meaningless inferred
probabilities.
As I said before, this is a very simple incarnation of
a problem that takes a lot of other forms, more complex but posing the same
essential challenge.
-- Ben G
