subject:"\[agi\] NARS probability"

It has been mentioned several times on this list that NARS has no
proper probabilistic interpretation. But, I think I have found one
that works OK. Not perfectly. There are some differences, but the
similarity is striking (at least to me).

I imagine that what I have come up with is not too different from what
Ben Goertzel and Pei Wang have already hashed out in their attempts to
reconcile the two, but we'll see. The general idea is to treat NARS as
probability plus a good number of regularity assumptions that justify
the inference steps of NARS. However, since I make so many
assumptions, it is very possible that some of them conflict. This
would show that NARS couldn't fit into probability theory after all,
but it is still interesting even if that's the case...

So, here's an outline. We start with the primitive inheritance
relation, A inh B; this could be called definite inheritance,
because it means that A inherits all of B's properties, and B inherits
all of A's instances. B is a superset of A. The truth value is 1 or 0.
Then, we define probabilistic inheritance, which carries a
probability that a given property of B will be inherited by A and that
a given instance of A will be inherited by B. Probabilistic
inheritance behaves somewhat like the full NARS inheritance: if we
reason about likelihoods (the probability of the data assuming (A
prob_inh B) = x), the math is actually the same EXCEPT we can only use
primitive inheritance as evidence, so we can't spread evidence around
the network by (1) treating prob_inh with high evidence as if it were
primitive inh or (2) attempting to use deduction to accumulate
evidence as we might want to, so that evidence for A prob_inh B and
evidence for B prob_inh C gets combined to evidence for A prob_inh
C.

So, we can define a second-order-probabilistic-inheritance prob_inh2
that is for prob_inh what prob_inh is for inh. We can define a
third-order over the second-order, a fourth over the second, and so
on. In fact, each of these are generalizations: simple inheritance can
be seen as a special case of prob_inh (where the probability is 1),
prob_inh is a special case of prob_inh2, and so on. This means we can
define an infinite-order probabilistic inheritance, prob_inh_inf,
which is a generalization of any given level. The truth value of
prob_inh_inf will be very complicated (since each prob_inhN has a more
complicated truth value than the last, and prob_inh_inf will include
the truth values from each level).

My proposal is to add 2 regularity assumptions to this structure.
First, we assume that the prior over probability values for prob_inh
is even. This givens us some permission to act like the probability
and the likelihood are the same thing, which brings the math closer to
NARS. Second, assume that a high truth value on one level strongly
implies a high one on the next value, and similarly that low implies
low. They will already weakly imply eachother, but I think the math
could be brought closer to NARS with a stronger assumption. I don't
have any precise suggestions however. The idea here is to allow
evidence that properly should only be counted for prob_inh2 to cound
for prob_inh as well, which is the case in NARS. This is point (1)
above. More generally, it justifies the NARSian practice of using the
simple prob_inh likelihood as if it were a likelihood for
prob_inh_inf, so that it recursively acts on other instances of itself
rather than only on simple inh.

Of course, since I have not given precise definitions, this solution
is difficult to evaluate. But, I thought it would be of interest.

--Abram Demski


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

Abram,

I think the best place to start, in exploring the relation between NARS
and probablity theory, is with Definition 3.7 in the paper

From Inheritance Relation to Non-Axiomatic
Logichttp://www.cogsci.indiana.edu/pub/wang.inheritance_nal.ps
[*International Journal of Approximate
Reasoninghttp://www.elsevier.com/wps/find/journaldescription.cws_home/505787/description#description
*, 11(4), 281-319, 1994]

which is downloadable from

http://nars.wang.googlepages.com/nars%3Apublication

It is instructive to look at specific situations, and see how this
definition
leads one to model situations differently from the way one traditionally
uses
probability theory to model such situations.

The next place to look, in exploring this relation, is at the semantics that
3.7 implies for the induction and abduction rules. Note that unlike in PLN
there are no term (node) probabilities in NARS, so that induction and
abduction cannot rely on Bayes rule or any close analogue of it. They must
be justified on quite different grounds. If you can formulate a
probabilistic
justification of NARS induction and abduction truth value formulas, I'll be
quite interested. I'm not saying it's impossible, just that it's not
obvious ...
one has to grapple with 3.7 and the fact that the NARS relative frequency
w+/w is combining intension and extension in a manner that is unusual
relative to ordinary probabilistic treatments.

The math here is simple enough that one does not need to do hand-wavy
philosophizing ;-) ... it's just elementary algebra. The subtle part is
really
the semantics, i.e. the way the math is used to model situations.

-- Ben G

On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote:

It has been mentioned several times on this list that NARS has no
proper probabilistic interpretation. But, I think I have found one
that works OK. Not perfectly. There are some differences, but the
similarity is striking (at least to me).

I imagine that what I have come up with is not too different from what
Ben Goertzel and Pei Wang have already hashed out in their attempts to
reconcile the two, but we'll see. The general idea is to treat NARS as
probability plus a good number of regularity assumptions that justify
the inference steps of NARS. However, since I make so many
assumptions, it is very possible that some of them conflict. This
would show that NARS couldn't fit into probability theory after all,
but it is still interesting even if that's the case...

So, here's an outline. We start with the primitive inheritance
relation, A inh B; this could be called definite inheritance,
because it means that A inherits all of B's properties, and B inherits
all of A's instances. B is a superset of A. The truth value is 1 or 0.
Then, we define probabilistic inheritance, which carries a
probability that a given property of B will be inherited by A and that
a given instance of A will be inherited by B. Probabilistic
inheritance behaves somewhat like the full NARS inheritance: if we
reason about likelihoods (the probability of the data assuming (A
prob_inh B) = x), the math is actually the same EXCEPT we can only use
primitive inheritance as evidence, so we can't spread evidence around
the network by (1) treating prob_inh with high evidence as if it were
primitive inh or (2) attempting to use deduction to accumulate
evidence as we might want to, so that evidence for A prob_inh B and
evidence for B prob_inh C gets combined to evidence for A prob_inh
C.

So, we can define a second-order-probabilistic-inheritance prob_inh2
that is for prob_inh what prob_inh is for inh. We can define a
third-order over the second-order, a fourth over the second, and so
on. In fact, each of these are generalizations: simple inheritance can
be seen as a special case of prob_inh (where the probability is 1),
prob_inh is a special case of prob_inh2, and so on. This means we can
define an infinite-order probabilistic inheritance, prob_inh_inf,
which is a generalization of any given level. The truth value of
prob_inh_inf will be very complicated (since each prob_inhN has a more
complicated truth value than the last, and prob_inh_inf will include
the truth values from each level).

My proposal is to add 2 regularity assumptions to this structure.
First, we assume that the prior over probability values for prob_inh
is even. This givens us some permission to act like the probability
and the likelihood are the same thing, which brings the math closer to
NARS. Second, assume that a high truth value on one level strongly
implies a high one on the next value, and similarly that low implies
low. They will already weakly imply eachother, but I think the math
could be brought closer to NARS with a stronger assumption. I don't
have any precise suggestions however. The idea here is to allow
evidence that properly should only be counted for prob_inh2 to cound
for prob_inh as well, which is the case in NARS.

Re: [agi] NARS probability

Ben,

Thanks for the references. I do not have any particularly good reason
for trying to do this, but it is a fun exercise and I find myself
making the attempt every so often :).

I haven't read the PLN book yet (though I downloaded a copy, thanks!),
but at present I don't see why term probabilities are needed... unless
inheritance relations A inh B are interpreted as conditional
probabilities A given B. I am not interpreting them that way-- I am
just treating inheritance as a reflexive and transitive relation that
(for some reason) we want to reason about probabilistically. As such,
it is easy to set up probabilistic treatments-- the challenge is to
get them to behave in a way that resembles NARS.

Another way of putting this is that I am not worrying too much about
the semantics, I'm just trying to get the formal manipulations to
match up.

And the definition 3.7 that you mentioned *does* match up, perfectly,
when the {w+, w} truth-value is interpreted as a way of representing
the likelihood density function of the prob_inh. Easy! The challenge
is section 4.4 in the paper you reference: syllogisms. The way
evidence is spread around there doesn't match with definition 3.7, not
without further probabilistic assumptions.

--Abram

On Sat, Sep 20, 2008 at 4:13 PM, Ben Goertzel [EMAIL PROTECTED] wrote:

 Abram,

 I think the best place to start, in exploring the relation between NARS
 and probablity theory, is with Definition 3.7 in the paper

 From Inheritance Relation to Non-Axiomatic Logic
 [International Journal of Approximate Reasoning, 11(4), 281-319, 1994]

 which is downloadable from

 http://nars.wang.googlepages.com/nars%3Apublication

 It is instructive to look at specific situations, and see how this
 definition
 leads one to model situations differently from the way one traditionally
 uses
 probability theory to model such situations.

 The next place to look, in exploring this relation, is at the semantics that
 3.7 implies for the induction and abduction rules.  Note that unlike in PLN
 there are no term (node) probabilities in NARS, so that induction and
 abduction cannot rely on Bayes rule or any close analogue of it.  They must
 be justified on quite different grounds.  If you can formulate a
 probabilistic
 justification of NARS induction and abduction truth value formulas, I'll be
 quite interested.   I'm not saying it's impossible, just that it's not
 obvious ...
 one has to grapple with 3.7 and the fact that the NARS relative frequency
 w+/w is combining intension and extension in a manner that is unusual
 relative to ordinary probabilistic treatments.

 The math here is simple enough that one does not need to do hand-wavy
 philosophizing ;-) ... it's just elementary algebra.  The subtle part is
 really
 the semantics, i.e. the way the math is used to model situations.

 -- Ben G



 On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote:

 It has been mentioned several times on this list that NARS has no
 proper probabilistic interpretation. But, I think I have found one
 that works OK. Not perfectly. There are some differences, but the
 similarity is striking (at least to me).

 I imagine that what I have come up with is not too different from what
 Ben Goertzel and Pei Wang have already hashed out in their attempts to
 reconcile the two, but we'll see. The general idea is to treat NARS as
 probability plus a good number of regularity assumptions that justify
 the inference steps of NARS. However, since I make so many
 assumptions, it is very possible that some of them conflict. This
 would show that NARS couldn't fit into probability theory after all,
 but it is still interesting even if that's the case...

 So, here's an outline. We start with the primitive inheritance
 relation, A inh B; this could be called definite inheritance,
 because it means that A inherits all of B's properties, and B inherits
 all of A's instances. B is a superset of A. The truth value is 1 or 0.
 Then, we define probabilistic inheritance, which carries a
 probability that a given property of B will be inherited by A and that
 a given instance of A will be inherited by B. Probabilistic
 inheritance behaves somewhat like the full NARS inheritance: if we
 reason about likelihoods (the probability of the data assuming (A
 prob_inh B) = x), the math is actually the same EXCEPT we can only use
 primitive inheritance as evidence, so we can't spread evidence around
 the network by (1) treating prob_inh with high evidence as if it were
 primitive inh or (2) attempting to use deduction to accumulate
 evidence as we might want to, so that evidence for A prob_inh B and
 evidence for B prob_inh C gets combined to evidence for A prob_inh
 C.

 So, we can define a second-order-probabilistic-inheritance prob_inh2
 that is for prob_inh what prob_inh is for inh. We can define a
 third-order over the second-order, a fourth over the second, and so
 on. In fact, each of these are generalizations: simple

Re: [agi] NARS probability


 I haven't read the PLN book yet (though I downloaded a copy, thanks!),
 but at present I don't see why term probabilities are needed... unless
 inheritance relations A inh B are interpreted as conditional
 probabilities A given B. I am not interpreting them that way-- I am
 just treating inheritance as a reflexive and transitive relation that
 (for some reason) we want to reason about probabilistically.


Well, one question is whether you want to be able to do inference like

A --B  tv1
|-
B --A  tv2

Doing that without term probabilities is pretty hard...

Another interesting approach would be to investigate which of
Cox's axioms (for probability) are violated in NARS, in what semantic
interpretation, and why...

ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

 And the definition 3.7 that you mentioned *does* match up, perfectly,
 when the {w+, w} truth-value is interpreted as a way of representing
 the likelihood density function of the prob_inh. Easy! The challenge
 is section 4.4 in the paper you reference: syllogisms. The way
 evidence is spread around there doesn't match with definition 3.7, not
 without further probabilistic assumptions.



which seems to be because the semantic interpretation of evidence
in 3.7 is different in NARS than in PLN or most probabilistic treatments...

this is why I suggested to look at how 3.7 is used to model a real
situation,
versus how that situation would be modeled in prob. theory...

having a good test situation in mind might help to think about the
syllogistic rules more clearly

it needs to be a situation where the terms and relations are grounded in
a system's experience, as that is what NARS and PLN semantics are both
all about...

ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

 Well, one question is whether you want to be able to do inference like

 A --B  tv1
 |-
 B --A  tv2

 Doing that without term probabilities is pretty hard...

Not the way I set it up. A--B is not the conditional probability
P(B|A), but it *is* a conditional probability, so the normal Bayesian
rules apply.

--Abram


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote:
 It has been mentioned several times on this list that NARS has no
 proper probabilistic interpretation. But, I think I have found one
 that works OK. Not perfectly. There are some differences, but the
 similarity is striking (at least to me).

Abram,

There is indeed a lot of similarity between NARS and probability
theory. When I started this project, my plan was to use probability
theory to handle uncertainty. I moved away from it after I believed
that what is needed cannot be fully obtained from that theory and its
extensions. Even so, NARS still agrees with probability theory here or
there, which were mentioned in my papers.

The key, therefore, is whether NARS can be FULLY treated as an
application of probability theory, by following the probability
axioms, and only adding justifiable consistent assumptions when
necessary.

 I imagine that what I have come up with is not too different from what
 Ben Goertzel and Pei Wang have already hashed out in their attempts to
 reconcile the two, but we'll see. The general idea is to treat NARS as
 probability plus a good number of regularity assumptions that justify
 the inference steps of NARS. However, since I make so many
 assumptions, it is very possible that some of them conflict. This
 would show that NARS couldn't fit into probability theory after all,
 but it is still interesting even if that's the case...

I assume by treat NARS as probability you mean to treat the
Frequency in NARS as a measurement following the axioms of probability
theory. I mentioned this because there is another measurement in
NARS, Expectation (which is derived from Frequency and Confidence),
which is also intuitively similar to probability.

 So, here's an outline. We start with the primitive inheritance
 relation, A inh B; this could be called definite inheritance,
 because it means that A inherits all of B's properties, and B inherits
 all of A's instances. B is a superset of A. The truth value is 1 or 0.

Fine.

 Then, we define probabilistic inheritance, which carries a
 probability that a given property of B will be inherited by A and that
 a given instance of A will be inherited by B.

There is a tricky issue here. When evaluating the truth value of
A--B, NARS doesn't only check properties and instances, but also
check supersets and subsets, intuitively speaking. For example,
when the system is told that Swans are birds and Swans fly, it
derives Birds fly by induction. In this process swan is counted as
one piece of evidence, rather than a set of instances. How many swans
the system knows doesn't matter in this step. That is why in the
definitions I use extension/intension, not instance/property,
because the latter is just special cases of the former. Actually, the
truth value of A--B measures how often the two terms can substitute
each other (in different ways), not how much one set is included in
the other, which is the usual probabilistic reading of an inheritance.

This is one reason why NARS does not define node probability.

 Probabilistic
 inheritance behaves somewhat like the full NARS inheritance: if we
 reason about likelihoods (the probability of the data assuming (A
 prob_inh B) = x), the math is actually the same EXCEPT we can only use
 primitive inheritance as evidence, so we can't spread evidence around
 the network by (1) treating prob_inh with high evidence as if it were
 primitive inh or (2) attempting to use deduction to accumulate
 evidence as we might want to, so that evidence for A prob_inh B and
 evidence for B prob_inh C gets combined to evidence for A prob_inh
 C.

Beside the problem you mentioned, there are other issues. Let me start
at the basic ones:

(1) In probability theory, an event E has a constant probability P(E)
(which can be unknown). Given the assumption of insufficient knowledge
and resources, in NARS P(A--B) would change over time, when more and
more evidence is taken into account. This process cannot be treated as
conditioning, because, among other things, the system can neither
explicitly list all evidence as condition, nor update the probability
of all statements in the system for each piece of new evidence (so as
to treat all background knowledge as a default condition).
Consequently, at any moment P(A--B) and P(B--C) may be based on
different, though unspecified, data, so it is invalid to use them in a
rule to calculate the probability of A--C --- probability theory
does not allow cross-distribution probability calculation.

(2) For the same reason, in NARS a statement might get different
probability attached, when derived from different evidence.
Probability theory does not have a general rule to handle
inconsistency within a probability distribution.

 So, we can define a second-order-probabilistic-inheritance prob_inh2
 that is for prob_inh what prob_inh is for inh. We can define a
 third-order over the second-order, a fourth over the second, and so
 on.

Re: [agi] NARS probability

 Beside the problem you mentioned, there are other issues. Let me start
 at the basic ones:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.



Of course, these issues can be handled in probability theory via introducing
higher-order probabilities ...

ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

Thanks for the critique. Replies follow...

On Sat, Sep 20, 2008 at 8:20 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote:
[...]
 The key, therefore, is whether NARS can be FULLY treated as an
 application of probability theory, by following the probability
 axioms, and only adding justifiable consistent assumptions when
 necessary.

Yes, that's the main question. Also, though, if the answer is no it is
potentially important to figure out why.

[...]
 I assume by treat NARS as probability you mean to treat the
 Frequency in NARS as a measurement following the axioms of probability
 theory. I mentioned this because there is another measurement in
 NARS, Expectation (which is derived from Frequency and Confidence),
 which is also intuitively similar to probability.

Yes, you are right... at least so far, I've only been looking at
frequency + confidence. Getting expectation from that does not look
like it violates any laws.

[...]

 Then, we define probabilistic inheritance, which carries a
 probability that a given property of B will be inherited by A and that
 a given instance of A will be inherited by B.

 There is a tricky issue here. When evaluating the truth value of
 A--B, NARS doesn't only check properties and instances, but also
 check supersets and subsets, intuitively speaking. For example,
 when the system is told that Swans are birds and Swans fly, it
 derives Birds fly by induction. In this process swan is counted as
 one piece of evidence, rather than a set of instances. How many swans
 the system knows doesn't matter in this step. That is why in the
 definitions I use extension/intension, not instance/property,
 because the latter is just special cases of the former. Actually, the
 truth value of A--B measures how often the two terms can substitute
 each other (in different ways), not how much one set is included in
 the other, which is the usual probabilistic reading of an inheritance.

 This is one reason why NARS does not define node probability.

Yes, I understand this. I should have worded myself more carefully.


 Probabilistic
 inheritance behaves somewhat like the full NARS inheritance: if we
 reason about likelihoods (the probability of the data assuming (A
 prob_inh B) = x), the math is actually the same EXCEPT we can only use
 primitive inheritance as evidence, so we can't spread evidence around
 the network by (1) treating prob_inh with high evidence as if it were
 primitive inh or (2) attempting to use deduction to accumulate
 evidence as we might want to, so that evidence for A prob_inh B and
 evidence for B prob_inh C gets combined to evidence for A prob_inh
 C.

 Beside the problem you mentioned, there are other issues. Let me start
 at the basic ones:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

This is not a problem the way I set things up. The likelihood of a
statement is welcome to change over time, as the evidence changes.


 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.

The same statement holds for PLN, right?
[...]

 My proposal is to add 2 regularity assumptions to this structure.
 First, we assume that the prior over probability values for prob_inh
 is even. This givens us some permission to act like the probability
 and the likelihood are the same thing, which brings the math closer to
 NARS.

 That is intuitively acceptable, if interpreted properly.

 Second, assume that a high truth value on one level strongly
 implies a high one on the next value, and similarly that low implies
 low.

 The first half is fine, but the second isn't. As the previous example
 shows, in NARS a high Confidence does implies that the Frequency value
 is a good summary of evidence, but a low Confidence does implies that
 the Frequency is bad, just that it is not very stable.

But I'm not talking about confidence when I say higher. I'm talking
about the system of levels I defined, for which it is perfectly OK.

Essentially what I'm claiming here is that the inferences of NARS are

Re: [agi] NARS probability

  (2) For the same reason, in NARS a statement might get different
  probability attached, when derived from different evidence.
  Probability theory does not have a general rule to handle
  inconsistency within a probability distribution.

 The same statement holds for PLN, right?


PLN handles inconsistency within probability distributions using
higher-order probabilities... both explicitly and, more simply, by allowing
multiple inconsistent estimates of the same distribution to exist attached
to the same node or link...




  If you work out a detailed solution along your path, you will see that
  it will be similar to NARS when both are doing deduction with strong
  evidence. The difference will show up (1) in cases where evidence is
  rare, and (2) in non-deductive inferences, such as induction and
  abduction. I believe this is also where NARS and PLN differ most.

 Guilty as charged! I have only tried to justify the deduction rule,
 not any of the others. I seriously didn't think about the blind spot
 until you mentioned it. I'll have to go back and take a closer look...


NARS deduction rule closely approximates the PLN deduction rule for the case
where all the premise terms have roughly the same node probability.  It
particularly closely approximates the concept geometry based variant of
the PLN deduction rule, which is interesting: it means NARS deduction
approximates the PLN deduction rule  variant one gets if one assumes
concepts are approximately spherically-shaped rather than being random sets.

NARS induction and abduction rules to not closely approximate the PLN
induction and abduction rules...

-- Ben G



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 This is not a problem the way I set things up. The likelihood of a
 statement is welcome to change over time, as the evidence changes.

If each of them is changed independently, you don't have a single
probability distribution anymore, but a bunch of them. In the above
case, you don't really have P(A--B) and P(B--C), but P_307(A--B)
and P_409(B--C). How can you use two probability values together if
they come from different distributions?

 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.

 The same statement holds for PLN, right?

Yes. Ben proposed a solution, which I won't comment until I see all
the details in the PLN book.

 The first half is fine, but the second isn't. As the previous example
 shows, in NARS a high Confidence does implies that the Frequency value
 is a good summary of evidence, but a low Confidence does implies that
 the Frequency is bad, just that it is not very stable.

 But I'm not talking about confidence when I say higher. I'm talking
 about the system of levels I defined, for which it is perfectly OK.

Yes, but the whole purpose of adding another value is to handle
inconsistency and belief revision. Higher-order probability is
mathematically sound, but won't do this work.

Think about a concrete example: if from one source the system gets
P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
conclusion when the two sources are considered together?

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability



 Think about a concrete example: if from one source the system gets
 P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
 P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
 conclusion when the two sources are considered together?



There are many approaches to this within the probabilistic framework,
one of which is contained within this paper, for example...

http://cat.inist.fr/?aModele=afficheNcpsidt=16174172

(I have a copy of the paper but I'm not sure where it's available for
free online ... if anyone finds it please post the link... thx)

Ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

I didn't know this paper, but I do know approaches based on the
principle of maximum/optimum entropy. They usually requires much more
information (or assumptions) than what is given in the following
example.

I'd be interested to know what the solution they will suggest for such
a situation.

Pei

On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote:



 Think about a concrete example: if from one source the system gets
 P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
 P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
 conclusion when the two sources are considered together?

 There are many approaches to this within the probabilistic framework,
 one of which is contained within this paper, for example...

 http://cat.inist.fr/?aModele=afficheNcpsidt=16174172

 (I have a copy of the paper but I'm not sure where it's available for
 free online ... if anyone finds it please post the link... thx)

 Ben
 
 agi | Archives | Modify Your Subscription


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

The approach in that paper doesn't require any special assumptions, and
could be applied to your example, but I don't have time to write up an
explanation of how to do the calculations ... you'll have to read the paper
yourself if you're curious ;-)

That approach is not implemented in PLN right now but we have debated
integrating it with PLN as in some ways it's subtler than what we currently
do in the code...

ben

On Sat, Sep 20, 2008 at 10:02 PM, Pei Wang [EMAIL PROTECTED] wrote:

 I didn't know this paper, but I do know approaches based on the
 principle of maximum/optimum entropy. They usually requires much more
 information (or assumptions) than what is given in the following
 example.

 I'd be interested to know what the solution they will suggest for such
 a situation.

 Pei

 On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote:
 
 
 
  Think about a concrete example: if from one source the system gets
  P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
  P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
  conclusion when the two sources are considered together?
 
  There are many approaches to this within the probabilistic framework,
  one of which is contained within this paper, for example...
 
  http://cat.inist.fr/?aModele=afficheNcpsidt=16174172
 
  (I have a copy of the paper but I'm not sure where it's available for
  free online ... if anyone finds it please post the link... thx)
 
  Ben
  
  agi | Archives | Modify Your Subscription


 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription:
 https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com




-- 
Ben Goertzel, PhD
CEO, Novamente LLC and Biomind LLC
Director of Research, SIAI
[EMAIL PROTECTED]

Nothing will ever be attempted if all possible objections must be first
overcome  - Dr Samuel Johnson



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

I found the paper.

As I guessed, their update operator is defined on the whole
probability distribution function, rather than on a single probability
value of an event. I don't think it is practical for AGI --- we cannot
afford the time to re-evaluate every belief on each piece of new
evidence. Also, I haven't seen a convincing argument on why an
intelligent system should follow the ME Principle.

Also this paper doesn't directly solve my example, because it doesn't
use second-order probability.

Pei

On Sat, Sep 20, 2008 at 10:13 PM, Ben Goertzel [EMAIL PROTECTED] wrote:

 The approach in that paper doesn't require any special assumptions, and
 could be applied to your example, but I don't have time to write up an
 explanation of how to do the calculations ... you'll have to read the paper
 yourself if you're curious ;-)

 That approach is not implemented in PLN right now but we have debated
 integrating it with PLN as in some ways it's subtler than what we currently
 do in the code...

 ben

 On Sat, Sep 20, 2008 at 10:02 PM, Pei Wang [EMAIL PROTECTED] wrote:

 I didn't know this paper, but I do know approaches based on the
 principle of maximum/optimum entropy. They usually requires much more
 information (or assumptions) than what is given in the following
 example.

 I'd be interested to know what the solution they will suggest for such
 a situation.

 Pei

 On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote:
 
 
 
  Think about a concrete example: if from one source the system gets
  P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
  P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
  conclusion when the two sources are considered together?
 
  There are many approaches to this within the probabilistic framework,
  one of which is contained within this paper, for example...
 
  http://cat.inist.fr/?aModele=afficheNcpsidt=16174172
 
  (I have a copy of the paper but I'm not sure where it's available for
  free online ... if anyone finds it please post the link... thx)
 
  Ben
  
  agi | Archives | Modify Your Subscription


 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription: https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com



 --
 Ben Goertzel, PhD
 CEO, Novamente LLC and Biomind LLC
 Director of Research, SIAI
 [EMAIL PROTECTED]

 Nothing will ever be attempted if all possible objections must be first
 overcome  - Dr Samuel Johnson


 
 agi | Archives | Modify Your Subscription


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

You are right in what you say about (1). The truth is, my analysis is
meant to apply to NARS operating with unrestricted time and memory
resources (which of course is not the point of NARS!). So, the
question is whether NARS approaches a probability calculation as it is
given more time to use all its data.

As for higher values... NARS and PLN may be using them for the purpose
you mention, but that is not the purpose I am giving them in my
analysis! In my analysis, I am simply trying to justify the deductions
allowed in NARS in a probabilistic way. Higher-order probabilities are
potentially useful here because of the way you sum evidence. Simply
put, it is as if NARS purposefully ignores the distinction between
different probability levels, so that a NARS frequency is also a
frequency-of-frequencies and frequency-of-frequency-of frequencies and
so on, all the way up.

The simple way of dealing with this is to say that it is wrong, and
results from a confusion of similar-looking mathematical entities.
But, to some extent, it is intuitive: I should not care too much in
normal reasoning which level of inheritance I'm using when I say
that a truck is a type of vehicle. So the question is, can this be
justified probabilistically? I think I can give a very tentative
yes.

--Abram

On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 This is not a problem the way I set things up. The likelihood of a
 statement is welcome to change over time, as the evidence changes.

 If each of them is changed independently, you don't have a single
 probability distribution anymore, but a bunch of them. In the above
 case, you don't really have P(A--B) and P(B--C), but P_307(A--B)
 and P_409(B--C). How can you use two probability values together if
 they come from different distributions?

 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.

 The same statement holds for PLN, right?

 Yes. Ben proposed a solution, which I won't comment until I see all
 the details in the PLN book.

 The first half is fine, but the second isn't. As the previous example
 shows, in NARS a high Confidence does implies that the Frequency value
 is a good summary of evidence, but a low Confidence does implies that
 the Frequency is bad, just that it is not very stable.

 But I'm not talking about confidence when I say higher. I'm talking
 about the system of levels I defined, for which it is perfectly OK.

 Yes, but the whole purpose of adding another value is to handle
 inconsistency and belief revision. Higher-order probability is
 mathematically sound, but won't do this work.

 Think about a concrete example: if from one source the system gets
 P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
 P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
 conclusion when the two sources are considered together?

 Pei


 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription: https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com

Re: [agi] NARS probability

On Sat, Sep 20, 2008 at 11:02 PM, Abram Demski [EMAIL PROTECTED] wrote:
 You are right in what you say about (1). The truth is, my analysis is
 meant to apply to NARS operating with unrestricted time and memory
 resources (which of course is not the point of NARS!). So, the
 question is whether NARS approaches a probability calculation as it is
 given more time to use all its data.

That is an interesting question. When the weight of evidence w goes to
infinite, so does confidence, and frequency converge to the limit of
positive evidence among all evidence, so it becomes probability, under
a certain interpretation. Therefore, as far as a single truth value is
concerned, probability theory is an extreme case of NARS.

However, to take all truth values in the system into account, it is
not necessarily true, because the two theories specify the relations
among statements/propositions differently. For example, probability
theory has conditional B|A, while NARS uses implication A==B, which
are similar, but not the same. Of course, there are some overlaps,
such as disjunction and conjunction, where NARS converges to
probability theory in the extreme case (infinite evidence).

 As for higher values... NARS and PLN may be using them for the purpose
 you mention, but that is not the purpose I am giving them in my
 analysis! In my analysis, I am simply trying to justify the deductions
 allowed in NARS in a probabilistic way. Higher-order probabilities are
 potentially useful here because of the way you sum evidence. Simply
 put, it is as if NARS purposefully ignores the distinction between
 different probability levels, so that a NARS frequency is also a
 frequency-of-frequencies and frequency-of-frequency-of frequencies and
 so on, all the way up.

I see what you mean, but as it is currently defined, in NARS there is
no need to introduce higher-order probabilities --- frequency is not
an estimation of a true probability. It is uncertain because the
influence of new evidence, not because it is inaccurate.

 The simple way of dealing with this is to say that it is wrong, and
 results from a confusion of similar-looking mathematical entities.
 But, to some extent, it is intuitive: I should not care too much in
 normal reasoning which level of inheritance I'm using when I say
 that a truck is a type of vehicle. So the question is, can this be
 justified probabilistically? I think I can give a very tentative
 yes.

Hopefully we'll know better about that when you explore further. ;-)

Pei

 --Abram

 On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 This is not a problem the way I set things up. The likelihood of a
 statement is welcome to change over time, as the evidence changes.

 If each of them is changed independently, you don't have a single
 probability distribution anymore, but a bunch of them. In the above
 case, you don't really have P(A--B) and P(B--C), but P_307(A--B)
 and P_409(B--C). How can you use two probability values together if
 they come from different distributions?

 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.

 The same statement holds for PLN, right?

 Yes. Ben proposed a solution, which I won't comment until I see all
 the details in the PLN book.

 The first half is fine, but the second isn't. As the previous example
 shows, in NARS a high Confidence does implies that the Frequency value
 is a good summary of evidence, but a low Confidence does implies that
 the Frequency is bad, just that it is not very stable.

 But I'm not talking about confidence when I say higher. I'm talking
 about the system of levels I defined, for which it is perfectly OK.

 Yes, but the whole purpose of adding another value is to handle
 inconsistency and belief revision. Higher-order probability is
 mathematically sound, but won't do this work.

 Think about a concrete example: if from one source the system gets
 P(A--B) = 0.9, and

Re: [agi] NARS probability