Richard,
The idea of the PLN semantics underlying Novamente's probabilistic
truth values is that we can have **both**
-- simple probabilistic truth values without highly specific interpretation
-- more complex, logically refined truth values, when this level of
precision is necessary
To make the discussion more concrete, I'll use a specfic example
to do with virtual animals in Second Life. Our first version of the
virtual pets won't use PLN in this sort of way, it'll be focused on MOSES
evolutionary learning; but, this is planned for the second version and
is within the scope of what Novamente can feasibly be expected to
do with modest effort.
Consider an avatar identified as Bob_Yifu
And, consider the concept of "friend", which is a ConceptNode
-- associated to the WordNode "friend" via a learned ReferenceLink
-- defined operationally via a number of links such as
ImplicationLink
AND
InheritanceLink X friend
EvaluationLink near (I, X)
Pleasure
(this one just says that being near a friend confers pleasure. Other
links about friendship may contain knowledge such as that friends
often give one food, friends help one find things, etc.)
The concept of "friend" may be learned, via mining of the animal's
experience-base --
basically, this is a matter of learning that there are certain predicates
whose SatisfyingSets (the set of Atoms that fulfill the predicate)
have significant intersection, and creating a ConceptNode to denote
that intersection.
Then, once the concept of "friend" has been formed, more links pertaining
to it may be learned via mining the experience base and via inference rules.
Then, we can may find that
InheritanceLink Bob_Yifu friend <.9,1>
(where the <.9,1> is an interval probability, interpreted according to
the indefinite probabilities framework) and this link mixes intensional
and extensional inheritance, and thus is only useful for heuristic
reasoning (which however is a very important kind).
What this link means is basically that Bob_Yifu's node in the memory
has a lot of the same links as the "friend" node -- or rather, that it
**would**, if all its links were allowed to exist rather than being
pruned to save memory. So, note that the semantics are actually
tied to the mind itself.
Or we can make more specialized logical constructs if we really
want to, denoting stuff like
-- at certain times Bob_Yifu is a friend
-- Bob displays some characteristics of friendship very strongly,
and others not at all
-- etc.
We can also do crude, heuristic contextualization like
ContextLink <.7,.8>
home
InheritanceLink Bob_Yifu friend
which suggests that Bob is less friendly at home than
in general.
Again this doesn't capture all the subtleties of Bob's friendship in
relation to being at home -- and one could do so if one wanted to, but
it would
require introducing a larger complex of nodes and links, which is
not always the most appropriate
thing to do.
The PLN inference rules are designed to give heuristically
correct conclusions based on heuristically interpreted links;
or more precise conclusions based on more precisely interpreted
links.
Finally, the semantics of PLN relationships is explicitly an
**experiential** semantics. (One of the early chapters in the PLN
book, to appear via Springer next year, is titled "Experiential
Semantics.") So, all node and link truth values in PLN are
intended to be settable and adjustable via experience, rather than
via programming or importation from databases or something like
that.
Now, the above example is of course a quite simple one.
Discussing a more complex example would go beyond the scope
of what I'm willing to do in an email conversation, but the mechanisms
I've described are not limited to such simple examples.
I am aware that identifying Bob_Yifu as a coherent, distinct entity is a
problem
faced by humans and robots, and eliminated via the simplicity of the SL
environment. However, there is detailed discussion in the (proprietary)
NM book of
how these same mechanisms may be used to do object recognition and
classification, as well.
You may of course argue that these mechanisms won't scale up
to large knowledge bases and rich experience streams. I believe that
they will, and have arguments but not rigorous proofs that they will.
-- Ben G
On Nov 13, 2007 12:34 PM, Richard Loosemore <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:
Mark Waser wrote:
> I'm going to try to put some words into Richard's mouth here
since
> I'm curious to see how close I am . . . . (while radically
changing the
> words).
>
> I think that Richard is not arguing about the possibility of
> Novamente-type solutions as much as he is arguing about the
> predictability of *very* flexible Novamente-type solutions as
they grow
> larger and more complex (and the difficulty in getting it to not
> instantaneously "crash-and-burn"). Indeed, I have heard a very
faint
> shadow of Richard's concerns in your statements about the "tuning"
> problems that you had with BioMind.
This is true, but not precise enough to capture the true nature of
my worry.
Let me focus on one aspect of the problem. My goal here is to describe
in a little detail how the Complex Systems Problem actually bites in a
particular case.
Suppose that in some significant part of Novamente there is a
representation system that uses "probability" or "likelihood" numbers to
encode the strength of facts, as in [I like cats](p=0.75). The (p=0.75)
is supposed to express the idea that the statement [I like cats] is in
some sense "75% true".
[Quick qualifier: I know that this oversimplifies the real situation in
Novamente, but I need to do this simplification in order to get my point
across, and I am pretty sure this will not affect my argument, so bear
with me].
We all know that this p value is not quite a "probability" or
"likelihood" or "confidence factor". It plays a very ambigous role in
the system, because on the one hand we want it to be very much like a
probability in the sense that we want to do calculations with it: we
NEED a calculus of such values in order to combine facts in the system
to make inferences. But we also do not want to lock ourselves into a
particular interpretation of what it means, because we know full well
that we do not really have a clear semantics for these numbers.
Either way, we have a problem: a fact like [I like cats](p=0.75) is
ungrounded because we have to interpret it. Does it mean that I like
cats 75% of the time? That I like 75% of all cats? 75% of each cat?
Are the cats that I like always the same ones, or is the chance of an
individual cat being liked by me something that changes? Does it mean
that I like all cats, but only 75% as much as I like my human family,
which I like(p=1.0)? And so on and so on.
Digging down to the root of this problem (and this is the point where I
am skipping from baby stuff to hard core AI) we want these numbers
to be
semantically compositional and interpretable, but in order to make sure
they are grounded, the system itself is going to have to build them
interpret them without our help ... and it is not clear that this
grounding can be completely implemented. Why is it not clear? Because
when you try to build the entire grounding mechanism(s) you are forced
to become explicit about what these numbers mean, during the process of
building a grounding system that you can trust to be doing its job:
you
cannot create a mechanism that you *know* is constructing sensible p
numbers and facts during all of its development *unless* you finally
bite the bullet and say what the p numbers really mean, in fully cashed
out terms.
[Suppose you did not do this. Suppose you built the grounding mechanism
but remained ambiguous about the meaning of the p numbers. What would
the resulting system be computing? From end to end it would be
building
facts with p numbers, but you the human observer would still be imposing
an interpretation on the facts. And if you are still doing anything to
interpret, it cannot be grounded].
Now, as far as I understand it, the standard approach to this
condundrum
is that researchers (in Novamente and elsewhere) do indeed make an
attempt to disambiguate the p numbers, but they do it by developing more
sophisticated logical systems. First, perhaps, error-value bands of p
values instead of sharp values. And temporal logic mechanisms to deal
with time. Perhaps clusters of p and q and r and s values, each with
some slightly different zones of applicability. More generally, people
try to give structure to the qualifiers that are appended to the facts:
[I like cats](qualfier=value) instead of [I like cats](p=0.75).
The question is, does this process of refinement have an end? Does it
really lead to a situation where the qualifier is disambiguated and the
semantics is clear enough to build a trustworthy grounding system? Is
there a closed-form solution to the problem of building a logic that
disambiguates the qualifiers?
Here is what I think will happen if this process is continued. In
order
to make the semantics unambiguous enough to let the system ground its
own knowledge without the interpretation of p values, researchers will
develop more and more sophisticated logics (with more and more
structured replacements for that simple p value), until they are forced
to introduce ideas that are so complicated that they do not allow you to
do the full job of compositionality any more: you cannot combine some
facts and have the combination of the complicated p-structures still be
interpretable. For example, if the system is encoded with such stuff as
[I like cats](general-likelihood=0.75 +- 0.05,
mood-variability=0.10 +-0.01,
time-stability=0.99 +0.005- 0.03,
overall-unsureness=0.07
special-circumstances-count=5 )
Then can we be *absolutely* sure that a combination of facts of this
sort is going to preserve its accuracy across long ranges of inference?
Can we combine this fact with an [I am allergic to cats](....) fact to
come to a clear conclusion about the proposition [I want to sit down and
let Sleti jump onto my lap](....)?
If we built a calculus to handle such structured facts, would we be
kidding ourselves about whether the semantics was *really*
compositional...? Or would we just be sweeping the ambiguity of the
interpretation of these facts under the carpet? Hiding the ambiguity
inside an impossibly dense thicket of qualfiers?
Here, then, are the two conclusions from this phase of my comment:
1) I do not believe anyone seriously knows if there is any end to the
research process of trying to get a logic to does this disambiguation.
I think it is an endeavor driven by pure hope.
2) I believe that, in the end, this search for a good enough logic will
result the construction of a grounding system (i.e. a mechanism that is
able to pick up and autonomously interpret all its own facts about the
world) that actually has NOT been disambiguated, and that for this
reason it will start to fall apart when used in large scale situations -
with large numbers of facts and/or over large stretches of autonomous
functioning. I think people will sweep the dismbiguation problem under
the carpet, and then only notice that they are getting bitten by it when
the large-scale system does not seem to generate coherent, sensible
knowledge when left to its own devices.
This second point is where I finally meet up with your comment about
problems on the larger scale, and the system crashing and burning. I
think it will be a slowish crash. Incidentally, I presume I do not
need
to labor the point about how this will probably appear on the larger
scale but might not be so obvious for small scale or toy demonstrations
of the mechanisms.
I need to finish by making a point about what I see as the underlying
cause of this problem.
The whole thing started because we wanted our p numbers to be
interpretable. What I believe will happen as a result of imposing this
design constraint is that we severely restrict the space of possible
grounding mechanisms that we allow ourselves to consider. By doing so,
we box ourselves into an increasingly tight corner, searching for a
solution that preserves compositional semantics, THEN quietly giving up
on the idea when we get into the depths of some horrendous
temporal/pragmatic/affective/case-based logic 8-) that we cannot, after
all, interpret ...... and then, having boxed ourselves into that
neighborhood of the space of all possible representational systems, we
find that there simply is no solution, given all those constraints.
(But, being stubborn, we carry on hacking away at it forever anyway).
So what is the solution? Well, easy: do not even try to make those p
numbers interpretable. Build systems that build their own
representations, give 'em p numbers to play with (and q and r and s
numbers if they want them), but let the mechanisms themselves use those
numbers without ever trying to exactly interpret them. Frankly, why
should we expect those numbers to be interpretable at all? Why should
we expect there to be a *calculus* that allows us to prove that a system
is truth-preserving?
In such a system the "truth value" of a fact would not be represented
inside the object(s) that encoded the fact, it would be the result of a
cluster of objects constraining one another. So, if the system has in
it the fact [I like cats], this would be connected to a host of other
facts, in such a way that if the system were asked "Do you like cats?"
it would build a large representation of the question and the
implications that were relevant in the present context, and the result
of all those objects interacting would be the thing that generated the
answer. If the person were responding to a questionnaire that forced
them to give an answer on a continuous scale between 0 and 100, they
might well put their mark at the 75% level, but this would not be the
result of retrieving a p value, it would be a nebulous, fleeting result
of the interaction of all the structures involved (and next time they
were asked, the value would probably be different).
Similarly, if the system were trying to decide whether or not to allow a
particular cat to jump up on its lap, given that it generally liked
cats, but was somewhat allergic, the decision would not be the
result of
a combination of p numbers (be they ever so complicated), but the result
of a collision of some huge, extended structures involving many facts.
The collision would certainly involve some weighing of p, q, r and s
(etc.) numbers stored in these objects, but these numbers would not be
interpretable, and the combination process would not be consistent with
a logical calculus.
There is much more that could be said about the methodology needed to
find mechanisms that could do this, but leaving that aside for the
moment, there is just the big philosophical question of whether to give
up our obsession with interpretable semantics, or whether to be so
scared of complex systems (because of course, such a system would very
likely introduce the Complex Systems Bogeyman) that we do not dare
try it.
That is a huge difference in philosophy. It is not just a small matter
of technique, it is huge perspective change.
So, to conclude, when I say that intelligence involves an irreducible
amount of complexity, I mean only that there are some situations in the
design of AGI systems, like the case I have just described, where I see
people going through a bizarre process:
Step 1) We decide that we must make our AGI as non-complex as
possible, so we can prove *something* about how knowledge-bits combine
to make reliable new knowledge-bits (in the above case, try to make it
as much like a probability-calculus or logical calculus, because we
know
that in the purest examples of such things, we can preserve truth as
knowledge is added).
Step 2) We are eventually forced to compromise our principles and
introduce hacks that flush the truth-preservation guarantees down the
toilet: in the above case, we complicate the qualifiers in our logic
until we can no longer really be sure what the semantics is when we
combine them (and in the related case of inference control engines, we
allow such engines to do funky, truncated explorations of the space of
possible inferences, with unpredictable consequences).
Step 3) We then refuse to acknowledge that what we have got, now, is
a compromise that *is* a complex system: its overall behavior is subtly
dependent on interactions down at the low level. One reason that we get
away with this blindness for so long is that it does not necessarily
show itself in small systems or in relatively small scale runs, or in
systems where the developmental mechanisms (the worst culprits for
bringing out the complexity) have not yet been impremented.
Step 4) Having let some complexity in through the back door, we then
keep hacking away at the design, hoping that somewhere in the design
neighborhood there is a solution that is both ALMOST compositional (i.e.
interpretable semantics, truth-preserving, etc.) and slightly complex.
In reality, we have most likely boxed ourselves in because of our
initial (quixotic) emphasis on making the semantics intepretable.
Hmmm... if my luck runs the way it usually does, all this will be as
clear as mud. Oh well. :-(
This commentary is not, of course, specific to Novamente, but is really
about an entire class of AGI systems that belong in the same family as
Novamente. My problem with Novamente is really that I do not see it
being flexible enough to throw out the meaningful, interpretable
parameters.
Richard Loosemore