Re: [agi] NARS probability

2008-09-28 Thread Pei Wang
I got it from an internal source.

Pei

On Sun, Sep 28, 2008 at 8:24 PM, Brad Paulsen [EMAIL PROTECTED] wrote:
 Pei,

 Would you mind sharing the link (that is, if you found it on the Internet)?

 Thanks,
 Brad

 Pei Wang wrote:

 I found the paper.

 As I guessed, their update operator is defined on the whole
 probability distribution function, rather than on a single probability
 value of an event. I don't think it is practical for AGI --- we cannot
 afford the time to re-evaluate every belief on each piece of new
 evidence. Also, I haven't seen a convincing argument on why an
 intelligent system should follow the ME Principle.

 Also this paper doesn't directly solve my example, because it doesn't
 use second-order probability.

 Pei

 On Sat, Sep 20, 2008 at 10:13 PM, Ben Goertzel [EMAIL PROTECTED] wrote:

 The approach in that paper doesn't require any special assumptions, and
 could be applied to your example, but I don't have time to write up an
 explanation of how to do the calculations ... you'll have to read the
 paper
 yourself if you're curious ;-)

 That approach is not implemented in PLN right now but we have debated
 integrating it with PLN as in some ways it's subtler than what we
 currently
 do in the code...

 ben

 On Sat, Sep 20, 2008 at 10:02 PM, Pei Wang [EMAIL PROTECTED]
 wrote:

 I didn't know this paper, but I do know approaches based on the
 principle of maximum/optimum entropy. They usually requires much more
 information (or assumptions) than what is given in the following
 example.

 I'd be interested to know what the solution they will suggest for such
 a situation.

 Pei

 On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote:

 Think about a concrete example: if from one source the system gets
 P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
 P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
 conclusion when the two sources are considered together?

 There are many approaches to this within the probabilistic framework,
 one of which is contained within this paper, for example...

 http://cat.inist.fr/?aModele=afficheNcpsidt=16174172

 (I have a copy of the paper but I'm not sure where it's available for
 free online ... if anyone finds it please post the link... thx)

 Ben
 
 agi | Archives | Modify Your Subscription

 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription: https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com


 --
 Ben Goertzel, PhD
 CEO, Novamente LLC and Biomind LLC
 Director of Research, SIAI
 [EMAIL PROTECTED]

 Nothing will ever be attempted if all possible objections must be first
 overcome  - Dr Samuel Johnson


 
 agi | Archives | Modify Your Subscription


 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription: https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com



 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription:
 https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-21 Thread Abram Demski
Hmm... I didn't mean infinite evidence, only infinite time and space
with which to compute the consequences of evidence. But that is
interesting too.

The higher-order probabilities I'm talking about introducing do not
reflect inaccuracy at all. :)
This may seem odd, but it seems to me to follow from your development
of NARS... so the difficulty for me is to account for why you can
exclude it in your system. Of course, this need only arises from
interpreting your definitions probabilistically.

I think I have come up with a more specific proposal. I will try to
write it up properly and see if it works.

--Abram

On Sat, Sep 20, 2008 at 11:28 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 11:02 PM, Abram Demski [EMAIL PROTECTED] wrote:
 You are right in what you say about (1). The truth is, my analysis is
 meant to apply to NARS operating with unrestricted time and memory
 resources (which of course is not the point of NARS!). So, the
 question is whether NARS approaches a probability calculation as it is
 given more time to use all its data.

 That is an interesting question. When the weight of evidence w goes to
 infinite, so does confidence, and frequency converge to the limit of
 positive evidence among all evidence, so it becomes probability, under
 a certain interpretation. Therefore, as far as a single truth value is
 concerned, probability theory is an extreme case of NARS.

 However, to take all truth values in the system into account, it is
 not necessarily true, because the two theories specify the relations
 among statements/propositions differently. For example, probability
 theory has conditional B|A, while NARS uses implication A==B, which
 are similar, but not the same. Of course, there are some overlaps,
 such as disjunction and conjunction, where NARS converges to
 probability theory in the extreme case (infinite evidence).

 As for higher values... NARS and PLN may be using them for the purpose
 you mention, but that is not the purpose I am giving them in my
 analysis! In my analysis, I am simply trying to justify the deductions
 allowed in NARS in a probabilistic way. Higher-order probabilities are
 potentially useful here because of the way you sum evidence. Simply
 put, it is as if NARS purposefully ignores the distinction between
 different probability levels, so that a NARS frequency is also a
 frequency-of-frequencies and frequency-of-frequency-of frequencies and
 so on, all the way up.

 I see what you mean, but as it is currently defined, in NARS there is
 no need to introduce higher-order probabilities --- frequency is not
 an estimation of a true probability. It is uncertain because the
 influence of new evidence, not because it is inaccurate.

 The simple way of dealing with this is to say that it is wrong, and
 results from a confusion of similar-looking mathematical entities.
 But, to some extent, it is intuitive: I should not care too much in
 normal reasoning which level of inheritance I'm using when I say
 that a truck is a type of vehicle. So the question is, can this be
 justified probabilistically? I think I can give a very tentative
 yes.

 Hopefully we'll know better about that when you explore further. ;-)

 Pei

 --Abram

 On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 This is not a problem the way I set things up. The likelihood of a
 statement is welcome to change over time, as the evidence changes.

 If each of them is changed independently, you don't have a single
 probability distribution anymore, but a bunch of them. In the above
 case, you don't really have P(A--B) and P(B--C), but P_307(A--B)
 and P_409(B--C). How can you use two probability values together if
 they come from different distributions?

 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.

 The same statement holds for PLN, right?

 Yes. Ben proposed a solution, which I won't comment until I see all
 

Re: [agi] NARS probability

2008-09-21 Thread Pei Wang
When working on your new proposal, remember that in NARS all
measurements must be based on what the system has --- limited evidence
and resources. I don't allow any objective probability that only
exists in a Platonic world or the infinite future.

Pei

On Sun, Sep 21, 2008 at 1:53 PM, Abram Demski [EMAIL PROTECTED] wrote:
 Hmm... I didn't mean infinite evidence, only infinite time and space
 with which to compute the consequences of evidence. But that is
 interesting too.

 The higher-order probabilities I'm talking about introducing do not
 reflect inaccuracy at all. :)
 This may seem odd, but it seems to me to follow from your development
 of NARS... so the difficulty for me is to account for why you can
 exclude it in your system. Of course, this need only arises from
 interpreting your definitions probabilistically.

 I think I have come up with a more specific proposal. I will try to
 write it up properly and see if it works.

 --Abram

 On Sat, Sep 20, 2008 at 11:28 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 11:02 PM, Abram Demski [EMAIL PROTECTED] wrote:
 You are right in what you say about (1). The truth is, my analysis is
 meant to apply to NARS operating with unrestricted time and memory
 resources (which of course is not the point of NARS!). So, the
 question is whether NARS approaches a probability calculation as it is
 given more time to use all its data.

 That is an interesting question. When the weight of evidence w goes to
 infinite, so does confidence, and frequency converge to the limit of
 positive evidence among all evidence, so it becomes probability, under
 a certain interpretation. Therefore, as far as a single truth value is
 concerned, probability theory is an extreme case of NARS.

 However, to take all truth values in the system into account, it is
 not necessarily true, because the two theories specify the relations
 among statements/propositions differently. For example, probability
 theory has conditional B|A, while NARS uses implication A==B, which
 are similar, but not the same. Of course, there are some overlaps,
 such as disjunction and conjunction, where NARS converges to
 probability theory in the extreme case (infinite evidence).

 As for higher values... NARS and PLN may be using them for the purpose
 you mention, but that is not the purpose I am giving them in my
 analysis! In my analysis, I am simply trying to justify the deductions
 allowed in NARS in a probabilistic way. Higher-order probabilities are
 potentially useful here because of the way you sum evidence. Simply
 put, it is as if NARS purposefully ignores the distinction between
 different probability levels, so that a NARS frequency is also a
 frequency-of-frequencies and frequency-of-frequency-of frequencies and
 so on, all the way up.

 I see what you mean, but as it is currently defined, in NARS there is
 no need to introduce higher-order probabilities --- frequency is not
 an estimation of a true probability. It is uncertain because the
 influence of new evidence, not because it is inaccurate.

 The simple way of dealing with this is to say that it is wrong, and
 results from a confusion of similar-looking mathematical entities.
 But, to some extent, it is intuitive: I should not care too much in
 normal reasoning which level of inheritance I'm using when I say
 that a truck is a type of vehicle. So the question is, can this be
 justified probabilistically? I think I can give a very tentative
 yes.

 Hopefully we'll know better about that when you explore further. ;-)

 Pei

 --Abram

 On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 This is not a problem the way I set things up. The likelihood of a
 statement is welcome to change over time, as the evidence changes.

 If each of them is changed independently, you don't have a single
 probability distribution anymore, but a bunch of them. In the above
 case, you don't really have P(A--B) and P(B--C), but P_307(A--B)
 and P_409(B--C). How can you use two probability values together if
 they come from different distributions?

 (2) For the 

Re: [agi] NARS probability

2008-09-21 Thread Matt Mahoney
--- On Sat, 9/20/08, Pei Wang [EMAIL PROTECTED] wrote:
 Think about a concrete example: if from one source the
 system gets
 P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while
 from another source
 P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then
 what will be the
 conclusion when the two sources are considered together?

This is a common problem in text prediction. In general, there is no right 
answer. You have to determine experimentally what works best. You compute the 
probability using some method, run it on some test data, and measure the 
accuracy of your predictions.

To give a more concrete example, suppose that A is some context (the last n 
bytes of text), and B is the event that the next bit is a 1. We get different 
predictions for different orders (different values of n) which we need to 
combine.

In PAQ1-PAQ3, I count zeros and ones in context A. Call these counts c0 and c1. 
Then I let P(A--B) = c1/(c0+c1) and let the confidence (what you call 
P(P(A--B)) be c0+c1. To combine them I add up the c0's and c1's and compute 
SUM c1 / (SUM c0 + SUM c1).

I also discovered experimentally that the prediction is more accurate if the 
counts are weighted by n^2. For example the order 19 context:

  the cat caught a mo_

is a better predictor of the next symbol than the order 2 context:

  mo_

even though the latter has probably collected more statistics, and therefore 
has a higher confidence.

In PAQ4-PAQ6 I adjust the weights dynamically using gradient descent of coding 
cost in weight space to reduce prediction error. This can be improved further 
by using multiple weight tables indexed by a low order context.

In PAQ7-PAQ8 I dynamically map each bit history (truncated c0,c1 plus the last 
bit) to a probability p_i using a table that is adjusted to reduce prediction 
error when the bit is observed. Then the predictions p_i are combined using a 
neural network:

  p = squash(SUM w_i stretch(p_i))

where squash(x) = 1/(1 + exp(-x)) bounds the output to (0, 1), and stretch(x) = 
ln(x / (1 - x)) is the inverse of squash. (This implicitly gives greater 
confidence to probabilities near 0 or 1). When actual bit b is observed, the 
weights are adjusted to reduce the prediction error b - p by gradient descent 
of coding cost in weight space as follows:

  w_i := w_i + L stretch(p_i) (b - p)

where L is the learning rate, typically 0.001 to 0.005. Again, we can improve 
this by using multiple weight tables indexed by a low order context. Or you can 
use multiple neural networks indexed by different order contexts and combine 
them by linear averaging or another neural network.

In PAQ9 I use chains of 2 input neural networks, where one input is the 
previous prediction from the next lower context order and the other input is 
fixed. The weight table is selected by the bit history in the next higher 
context. This method is still experimental. It works well for simple n-gram 
models but worse than PAQ8 when there are large numbers of approximately 
equally good predictions to combine, such as when semantic (cat ~ mouse) and 
other contexts are added.

-- Matt Mahoney, [EMAIL PROTECTED]




---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


[agi] NARS probability

2008-09-20 Thread Abram Demski
It has been mentioned several times on this list that NARS has no
proper probabilistic interpretation. But, I think I have found one
that works OK. Not perfectly. There are some differences, but the
similarity is striking (at least to me).

I imagine that what I have come up with is not too different from what
Ben Goertzel and Pei Wang have already hashed out in their attempts to
reconcile the two, but we'll see. The general idea is to treat NARS as
probability plus a good number of regularity assumptions that justify
the inference steps of NARS. However, since I make so many
assumptions, it is very possible that some of them conflict. This
would show that NARS couldn't fit into probability theory after all,
but it is still interesting even if that's the case...

So, here's an outline. We start with the primitive inheritance
relation, A inh B; this could be called definite inheritance,
because it means that A inherits all of B's properties, and B inherits
all of A's instances. B is a superset of A. The truth value is 1 or 0.
Then, we define probabilistic inheritance, which carries a
probability that a given property of B will be inherited by A and that
a given instance of A will be inherited by B. Probabilistic
inheritance behaves somewhat like the full NARS inheritance: if we
reason about likelihoods (the probability of the data assuming (A
prob_inh B) = x), the math is actually the same EXCEPT we can only use
primitive inheritance as evidence, so we can't spread evidence around
the network by (1) treating prob_inh with high evidence as if it were
primitive inh or (2) attempting to use deduction to accumulate
evidence as we might want to, so that evidence for A prob_inh B and
evidence for B prob_inh C gets combined to evidence for A prob_inh
C.

So, we can define a second-order-probabilistic-inheritance prob_inh2
that is for prob_inh what prob_inh is for inh. We can define a
third-order over the second-order, a fourth over the second, and so
on. In fact, each of these are generalizations: simple inheritance can
be seen as a special case of prob_inh (where the probability is 1),
prob_inh is a special case of prob_inh2, and so on. This means we can
define an infinite-order probabilistic inheritance, prob_inh_inf,
which is a generalization of any given level. The truth value of
prob_inh_inf will be very complicated (since each prob_inhN has a more
complicated truth value than the last, and prob_inh_inf will include
the truth values from each level).

My proposal is to add 2 regularity assumptions to this structure.
First, we assume that the prior over probability values for prob_inh
is even. This givens us some permission to act like the probability
and the likelihood are the same thing, which brings the math closer to
NARS. Second, assume that a high truth value on one level strongly
implies a high one on the next value, and similarly that low implies
low. They will already weakly imply eachother, but I think the math
could be brought closer to NARS with a stronger assumption. I don't
have any precise suggestions however. The idea here is to allow
evidence that properly should only be counted for prob_inh2 to cound
for prob_inh as well, which is the case in NARS. This is point (1)
above. More generally, it justifies the NARSian practice of using the
simple prob_inh likelihood as if it were a likelihood for
prob_inh_inf, so that it recursively acts on other instances of itself
rather than only on simple inh.

Of course, since I have not given precise definitions, this solution
is difficult to evaluate. But, I thought it would be of interest.

--Abram Demski


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Ben Goertzel
Abram,

I think the best place to start, in exploring the relation between NARS
and probablity theory, is with Definition 3.7 in the paper

From Inheritance Relation to Non-Axiomatic
Logichttp://www.cogsci.indiana.edu/pub/wang.inheritance_nal.ps
[*International Journal of Approximate
Reasoninghttp://www.elsevier.com/wps/find/journaldescription.cws_home/505787/description#description
*, 11(4), 281-319, 1994]

which is downloadable from

http://nars.wang.googlepages.com/nars%3Apublication

It is instructive to look at specific situations, and see how this
definition
leads one to model situations differently from the way one traditionally
uses
probability theory to model such situations.

The next place to look, in exploring this relation, is at the semantics that
3.7 implies for the induction and abduction rules.  Note that unlike in PLN
there are no term (node) probabilities in NARS, so that induction and
abduction cannot rely on Bayes rule or any close analogue of it.  They must
be justified on quite different grounds.  If you can formulate a
probabilistic
justification of NARS induction and abduction truth value formulas, I'll be
quite interested.   I'm not saying it's impossible, just that it's not
obvious ...
one has to grapple with 3.7 and the fact that the NARS relative frequency
w+/w is combining intension and extension in a manner that is unusual
relative to ordinary probabilistic treatments.

The math here is simple enough that one does not need to do hand-wavy
philosophizing ;-) ... it's just elementary algebra.  The subtle part is
really
the semantics, i.e. the way the math is used to model situations.

-- Ben G



On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote:

 It has been mentioned several times on this list that NARS has no
 proper probabilistic interpretation. But, I think I have found one
 that works OK. Not perfectly. There are some differences, but the
 similarity is striking (at least to me).

 I imagine that what I have come up with is not too different from what
 Ben Goertzel and Pei Wang have already hashed out in their attempts to
 reconcile the two, but we'll see. The general idea is to treat NARS as
 probability plus a good number of regularity assumptions that justify
 the inference steps of NARS. However, since I make so many
 assumptions, it is very possible that some of them conflict. This
 would show that NARS couldn't fit into probability theory after all,
 but it is still interesting even if that's the case...

 So, here's an outline. We start with the primitive inheritance
 relation, A inh B; this could be called definite inheritance,
 because it means that A inherits all of B's properties, and B inherits
 all of A's instances. B is a superset of A. The truth value is 1 or 0.
 Then, we define probabilistic inheritance, which carries a
 probability that a given property of B will be inherited by A and that
 a given instance of A will be inherited by B. Probabilistic
 inheritance behaves somewhat like the full NARS inheritance: if we
 reason about likelihoods (the probability of the data assuming (A
 prob_inh B) = x), the math is actually the same EXCEPT we can only use
 primitive inheritance as evidence, so we can't spread evidence around
 the network by (1) treating prob_inh with high evidence as if it were
 primitive inh or (2) attempting to use deduction to accumulate
 evidence as we might want to, so that evidence for A prob_inh B and
 evidence for B prob_inh C gets combined to evidence for A prob_inh
 C.

 So, we can define a second-order-probabilistic-inheritance prob_inh2
 that is for prob_inh what prob_inh is for inh. We can define a
 third-order over the second-order, a fourth over the second, and so
 on. In fact, each of these are generalizations: simple inheritance can
 be seen as a special case of prob_inh (where the probability is 1),
 prob_inh is a special case of prob_inh2, and so on. This means we can
 define an infinite-order probabilistic inheritance, prob_inh_inf,
 which is a generalization of any given level. The truth value of
 prob_inh_inf will be very complicated (since each prob_inhN has a more
 complicated truth value than the last, and prob_inh_inf will include
 the truth values from each level).

 My proposal is to add 2 regularity assumptions to this structure.
 First, we assume that the prior over probability values for prob_inh
 is even. This givens us some permission to act like the probability
 and the likelihood are the same thing, which brings the math closer to
 NARS. Second, assume that a high truth value on one level strongly
 implies a high one on the next value, and similarly that low implies
 low. They will already weakly imply eachother, but I think the math
 could be brought closer to NARS with a stronger assumption. I don't
 have any precise suggestions however. The idea here is to allow
 evidence that properly should only be counted for prob_inh2 to cound
 for prob_inh as well, which is the case in NARS. 

Re: [agi] NARS probability

2008-09-20 Thread Abram Demski
Ben,

Thanks for the references. I do not have any particularly good reason
for trying to do this, but it is a fun exercise and I find myself
making the attempt every so often :).

I haven't read the PLN book yet (though I downloaded a copy, thanks!),
but at present I don't see why term probabilities are needed... unless
inheritance relations A inh B are interpreted as conditional
probabilities A given B. I am not interpreting them that way-- I am
just treating inheritance as a reflexive and transitive relation that
(for some reason) we want to reason about probabilistically. As such,
it is easy to set up probabilistic treatments-- the challenge is to
get them to behave in a way that resembles NARS.

Another way of putting this is that I am not worrying too much about
the semantics, I'm just trying to get the formal manipulations to
match up.

And the definition 3.7 that you mentioned *does* match up, perfectly,
when the {w+, w} truth-value is interpreted as a way of representing
the likelihood density function of the prob_inh. Easy! The challenge
is section 4.4 in the paper you reference: syllogisms. The way
evidence is spread around there doesn't match with definition 3.7, not
without further probabilistic assumptions.

--Abram

On Sat, Sep 20, 2008 at 4:13 PM, Ben Goertzel [EMAIL PROTECTED] wrote:

 Abram,

 I think the best place to start, in exploring the relation between NARS
 and probablity theory, is with Definition 3.7 in the paper

 From Inheritance Relation to Non-Axiomatic Logic
 [International Journal of Approximate Reasoning, 11(4), 281-319, 1994]

 which is downloadable from

 http://nars.wang.googlepages.com/nars%3Apublication

 It is instructive to look at specific situations, and see how this
 definition
 leads one to model situations differently from the way one traditionally
 uses
 probability theory to model such situations.

 The next place to look, in exploring this relation, is at the semantics that
 3.7 implies for the induction and abduction rules.  Note that unlike in PLN
 there are no term (node) probabilities in NARS, so that induction and
 abduction cannot rely on Bayes rule or any close analogue of it.  They must
 be justified on quite different grounds.  If you can formulate a
 probabilistic
 justification of NARS induction and abduction truth value formulas, I'll be
 quite interested.   I'm not saying it's impossible, just that it's not
 obvious ...
 one has to grapple with 3.7 and the fact that the NARS relative frequency
 w+/w is combining intension and extension in a manner that is unusual
 relative to ordinary probabilistic treatments.

 The math here is simple enough that one does not need to do hand-wavy
 philosophizing ;-) ... it's just elementary algebra.  The subtle part is
 really
 the semantics, i.e. the way the math is used to model situations.

 -- Ben G



 On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote:

 It has been mentioned several times on this list that NARS has no
 proper probabilistic interpretation. But, I think I have found one
 that works OK. Not perfectly. There are some differences, but the
 similarity is striking (at least to me).

 I imagine that what I have come up with is not too different from what
 Ben Goertzel and Pei Wang have already hashed out in their attempts to
 reconcile the two, but we'll see. The general idea is to treat NARS as
 probability plus a good number of regularity assumptions that justify
 the inference steps of NARS. However, since I make so many
 assumptions, it is very possible that some of them conflict. This
 would show that NARS couldn't fit into probability theory after all,
 but it is still interesting even if that's the case...

 So, here's an outline. We start with the primitive inheritance
 relation, A inh B; this could be called definite inheritance,
 because it means that A inherits all of B's properties, and B inherits
 all of A's instances. B is a superset of A. The truth value is 1 or 0.
 Then, we define probabilistic inheritance, which carries a
 probability that a given property of B will be inherited by A and that
 a given instance of A will be inherited by B. Probabilistic
 inheritance behaves somewhat like the full NARS inheritance: if we
 reason about likelihoods (the probability of the data assuming (A
 prob_inh B) = x), the math is actually the same EXCEPT we can only use
 primitive inheritance as evidence, so we can't spread evidence around
 the network by (1) treating prob_inh with high evidence as if it were
 primitive inh or (2) attempting to use deduction to accumulate
 evidence as we might want to, so that evidence for A prob_inh B and
 evidence for B prob_inh C gets combined to evidence for A prob_inh
 C.

 So, we can define a second-order-probabilistic-inheritance prob_inh2
 that is for prob_inh what prob_inh is for inh. We can define a
 third-order over the second-order, a fourth over the second, and so
 on. In fact, each of these are generalizations: simple 

Re: [agi] NARS probability

2008-09-20 Thread Ben Goertzel

 I haven't read the PLN book yet (though I downloaded a copy, thanks!),
 but at present I don't see why term probabilities are needed... unless
 inheritance relations A inh B are interpreted as conditional
 probabilities A given B. I am not interpreting them that way-- I am
 just treating inheritance as a reflexive and transitive relation that
 (for some reason) we want to reason about probabilistically.


Well, one question is whether you want to be able to do inference like

A --B  tv1
|-
B --A  tv2

Doing that without term probabilities is pretty hard...

Another interesting approach would be to investigate which of
Cox's axioms (for probability) are violated in NARS, in what semantic
interpretation, and why...

ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Ben Goertzel
 And the definition 3.7 that you mentioned *does* match up, perfectly,
 when the {w+, w} truth-value is interpreted as a way of representing
 the likelihood density function of the prob_inh. Easy! The challenge
 is section 4.4 in the paper you reference: syllogisms. The way
 evidence is spread around there doesn't match with definition 3.7, not
 without further probabilistic assumptions.



which seems to be because the semantic interpretation of evidence
in 3.7 is different in NARS than in PLN or most probabilistic treatments...

this is why I suggested to look at how 3.7 is used to model a real
situation,
versus how that situation would be modeled in prob. theory...

having a good test situation in mind might help to think about the
syllogistic rules more clearly

it needs to be a situation where the terms and relations are grounded in
a system's experience, as that is what NARS and PLN semantics are both
all about...

ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Abram Demski
 Well, one question is whether you want to be able to do inference like

 A --B  tv1
 |-
 B --A  tv2

 Doing that without term probabilities is pretty hard...

Not the way I set it up. A--B is not the conditional probability
P(B|A), but it *is* a conditional probability, so the normal Bayesian
rules apply.

--Abram


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Pei Wang
On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote:
 It has been mentioned several times on this list that NARS has no
 proper probabilistic interpretation. But, I think I have found one
 that works OK. Not perfectly. There are some differences, but the
 similarity is striking (at least to me).

Abram,

There is indeed a lot of similarity between NARS and probability
theory. When I started this project, my plan was to use probability
theory to handle uncertainty. I moved away from it after I believed
that what is needed cannot be fully obtained from that theory and its
extensions. Even so, NARS still agrees with probability theory here or
there, which were mentioned in my papers.

The key, therefore, is whether NARS can be FULLY treated as an
application of probability theory, by following the probability
axioms, and only adding justifiable consistent assumptions when
necessary.

 I imagine that what I have come up with is not too different from what
 Ben Goertzel and Pei Wang have already hashed out in their attempts to
 reconcile the two, but we'll see. The general idea is to treat NARS as
 probability plus a good number of regularity assumptions that justify
 the inference steps of NARS. However, since I make so many
 assumptions, it is very possible that some of them conflict. This
 would show that NARS couldn't fit into probability theory after all,
 but it is still interesting even if that's the case...

I assume by treat NARS as probability you mean to treat the
Frequency in NARS as a measurement following the axioms of probability
theory. I mentioned this because there is another measurement in
NARS, Expectation (which is derived from Frequency and Confidence),
which is also intuitively similar to probability.

 So, here's an outline. We start with the primitive inheritance
 relation, A inh B; this could be called definite inheritance,
 because it means that A inherits all of B's properties, and B inherits
 all of A's instances. B is a superset of A. The truth value is 1 or 0.

Fine.

 Then, we define probabilistic inheritance, which carries a
 probability that a given property of B will be inherited by A and that
 a given instance of A will be inherited by B.

There is a tricky issue here. When evaluating the truth value of
A--B, NARS doesn't only check properties and instances, but also
check supersets and subsets, intuitively speaking. For example,
when the system is told that Swans are birds and Swans fly, it
derives Birds fly by induction. In this process swan is counted as
one piece of evidence, rather than a set of instances. How many swans
the system knows doesn't matter in this step. That is why in the
definitions I use extension/intension, not instance/property,
because the latter is just special cases of the former. Actually, the
truth value of A--B measures how often the two terms can substitute
each other (in different ways), not how much one set is included in
the other, which is the usual probabilistic reading of an inheritance.

This is one reason why NARS does not define node probability.

 Probabilistic
 inheritance behaves somewhat like the full NARS inheritance: if we
 reason about likelihoods (the probability of the data assuming (A
 prob_inh B) = x), the math is actually the same EXCEPT we can only use
 primitive inheritance as evidence, so we can't spread evidence around
 the network by (1) treating prob_inh with high evidence as if it were
 primitive inh or (2) attempting to use deduction to accumulate
 evidence as we might want to, so that evidence for A prob_inh B and
 evidence for B prob_inh C gets combined to evidence for A prob_inh
 C.

Beside the problem you mentioned, there are other issues. Let me start
at the basic ones:

(1) In probability theory, an event E has a constant probability P(E)
(which can be unknown). Given the assumption of insufficient knowledge
and resources, in NARS P(A--B) would change over time, when more and
more evidence is taken into account. This process cannot be treated as
conditioning, because, among other things, the system can neither
explicitly list all evidence as condition, nor update the probability
of all statements in the system for each piece of new evidence (so as
to treat all background knowledge as a default condition).
Consequently, at any moment P(A--B) and P(B--C) may be based on
different, though unspecified, data, so it is invalid to use them in a
rule to calculate the probability of A--C --- probability theory
does not allow cross-distribution probability calculation.

(2) For the same reason, in NARS a statement might get different
probability attached, when derived from different evidence.
Probability theory does not have a general rule to handle
inconsistency within a probability distribution.

 So, we can define a second-order-probabilistic-inheritance prob_inh2
 that is for prob_inh what prob_inh is for inh. We can define a
 third-order over the second-order, a fourth over the second, and so
 on. 

Re: [agi] NARS probability

2008-09-20 Thread Ben Goertzel
 Beside the problem you mentioned, there are other issues. Let me start
 at the basic ones:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.



Of course, these issues can be handled in probability theory via introducing
higher-order probabilities ...

ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Abram Demski
Thanks for the critique. Replies follow...

On Sat, Sep 20, 2008 at 8:20 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote:
[...]
 The key, therefore, is whether NARS can be FULLY treated as an
 application of probability theory, by following the probability
 axioms, and only adding justifiable consistent assumptions when
 necessary.

Yes, that's the main question. Also, though, if the answer is no it is
potentially important to figure out why.

[...]
 I assume by treat NARS as probability you mean to treat the
 Frequency in NARS as a measurement following the axioms of probability
 theory. I mentioned this because there is another measurement in
 NARS, Expectation (which is derived from Frequency and Confidence),
 which is also intuitively similar to probability.

Yes, you are right... at least so far, I've only been looking at
frequency + confidence. Getting expectation from that does not look
like it violates any laws.

[...]

 Then, we define probabilistic inheritance, which carries a
 probability that a given property of B will be inherited by A and that
 a given instance of A will be inherited by B.

 There is a tricky issue here. When evaluating the truth value of
 A--B, NARS doesn't only check properties and instances, but also
 check supersets and subsets, intuitively speaking. For example,
 when the system is told that Swans are birds and Swans fly, it
 derives Birds fly by induction. In this process swan is counted as
 one piece of evidence, rather than a set of instances. How many swans
 the system knows doesn't matter in this step. That is why in the
 definitions I use extension/intension, not instance/property,
 because the latter is just special cases of the former. Actually, the
 truth value of A--B measures how often the two terms can substitute
 each other (in different ways), not how much one set is included in
 the other, which is the usual probabilistic reading of an inheritance.

 This is one reason why NARS does not define node probability.

Yes, I understand this. I should have worded myself more carefully.


 Probabilistic
 inheritance behaves somewhat like the full NARS inheritance: if we
 reason about likelihoods (the probability of the data assuming (A
 prob_inh B) = x), the math is actually the same EXCEPT we can only use
 primitive inheritance as evidence, so we can't spread evidence around
 the network by (1) treating prob_inh with high evidence as if it were
 primitive inh or (2) attempting to use deduction to accumulate
 evidence as we might want to, so that evidence for A prob_inh B and
 evidence for B prob_inh C gets combined to evidence for A prob_inh
 C.

 Beside the problem you mentioned, there are other issues. Let me start
 at the basic ones:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

This is not a problem the way I set things up. The likelihood of a
statement is welcome to change over time, as the evidence changes.


 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.

The same statement holds for PLN, right?
[...]

 My proposal is to add 2 regularity assumptions to this structure.
 First, we assume that the prior over probability values for prob_inh
 is even. This givens us some permission to act like the probability
 and the likelihood are the same thing, which brings the math closer to
 NARS.

 That is intuitively acceptable, if interpreted properly.

 Second, assume that a high truth value on one level strongly
 implies a high one on the next value, and similarly that low implies
 low.

 The first half is fine, but the second isn't. As the previous example
 shows, in NARS a high Confidence does implies that the Frequency value
 is a good summary of evidence, but a low Confidence does implies that
 the Frequency is bad, just that it is not very stable.

But I'm not talking about confidence when I say higher. I'm talking
about the system of levels I defined, for which it is perfectly OK.

Essentially what I'm claiming here is that the inferences of NARS are

Re: [agi] NARS probability

2008-09-20 Thread Ben Goertzel
  (2) For the same reason, in NARS a statement might get different
  probability attached, when derived from different evidence.
  Probability theory does not have a general rule to handle
  inconsistency within a probability distribution.

 The same statement holds for PLN, right?


PLN handles inconsistency within probability distributions using
higher-order probabilities... both explicitly and, more simply, by allowing
multiple inconsistent estimates of the same distribution to exist attached
to the same node or link...




  If you work out a detailed solution along your path, you will see that
  it will be similar to NARS when both are doing deduction with strong
  evidence. The difference will show up (1) in cases where evidence is
  rare, and (2) in non-deductive inferences, such as induction and
  abduction. I believe this is also where NARS and PLN differ most.

 Guilty as charged! I have only tried to justify the deduction rule,
 not any of the others. I seriously didn't think about the blind spot
 until you mentioned it. I'll have to go back and take a closer look...


NARS deduction rule closely approximates the PLN deduction rule for the case
where all the premise terms have roughly the same node probability.  It
particularly closely approximates the concept geometry based variant of
the PLN deduction rule, which is interesting: it means NARS deduction
approximates the PLN deduction rule  variant one gets if one assumes
concepts are approximately spherically-shaped rather than being random sets.

NARS induction and abduction rules to not closely approximate the PLN
induction and abduction rules...

-- Ben G



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Pei Wang
On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 This is not a problem the way I set things up. The likelihood of a
 statement is welcome to change over time, as the evidence changes.

If each of them is changed independently, you don't have a single
probability distribution anymore, but a bunch of them. In the above
case, you don't really have P(A--B) and P(B--C), but P_307(A--B)
and P_409(B--C). How can you use two probability values together if
they come from different distributions?

 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.

 The same statement holds for PLN, right?

Yes. Ben proposed a solution, which I won't comment until I see all
the details in the PLN book.

 The first half is fine, but the second isn't. As the previous example
 shows, in NARS a high Confidence does implies that the Frequency value
 is a good summary of evidence, but a low Confidence does implies that
 the Frequency is bad, just that it is not very stable.

 But I'm not talking about confidence when I say higher. I'm talking
 about the system of levels I defined, for which it is perfectly OK.

Yes, but the whole purpose of adding another value is to handle
inconsistency and belief revision. Higher-order probability is
mathematically sound, but won't do this work.

Think about a concrete example: if from one source the system gets
P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
conclusion when the two sources are considered together?

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Ben Goertzel


 Think about a concrete example: if from one source the system gets
 P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
 P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
 conclusion when the two sources are considered together?



There are many approaches to this within the probabilistic framework,
one of which is contained within this paper, for example...

http://cat.inist.fr/?aModele=afficheNcpsidt=16174172

(I have a copy of the paper but I'm not sure where it's available for
free online ... if anyone finds it please post the link... thx)

Ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Pei Wang
I didn't know this paper, but I do know approaches based on the
principle of maximum/optimum entropy. They usually requires much more
information (or assumptions) than what is given in the following
example.

I'd be interested to know what the solution they will suggest for such
a situation.

Pei

On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote:



 Think about a concrete example: if from one source the system gets
 P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
 P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
 conclusion when the two sources are considered together?

 There are many approaches to this within the probabilistic framework,
 one of which is contained within this paper, for example...

 http://cat.inist.fr/?aModele=afficheNcpsidt=16174172

 (I have a copy of the paper but I'm not sure where it's available for
 free online ... if anyone finds it please post the link... thx)

 Ben
 
 agi | Archives | Modify Your Subscription


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Ben Goertzel
The approach in that paper doesn't require any special assumptions, and
could be applied to your example, but I don't have time to write up an
explanation of how to do the calculations ... you'll have to read the paper
yourself if you're curious ;-)

That approach is not implemented in PLN right now but we have debated
integrating it with PLN as in some ways it's subtler than what we currently
do in the code...

ben

On Sat, Sep 20, 2008 at 10:02 PM, Pei Wang [EMAIL PROTECTED] wrote:

 I didn't know this paper, but I do know approaches based on the
 principle of maximum/optimum entropy. They usually requires much more
 information (or assumptions) than what is given in the following
 example.

 I'd be interested to know what the solution they will suggest for such
 a situation.

 Pei

 On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote:
 
 
 
  Think about a concrete example: if from one source the system gets
  P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
  P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
  conclusion when the two sources are considered together?
 
  There are many approaches to this within the probabilistic framework,
  one of which is contained within this paper, for example...
 
  http://cat.inist.fr/?aModele=afficheNcpsidt=16174172
 
  (I have a copy of the paper but I'm not sure where it's available for
  free online ... if anyone finds it please post the link... thx)
 
  Ben
  
  agi | Archives | Modify Your Subscription


 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription:
 https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com




-- 
Ben Goertzel, PhD
CEO, Novamente LLC and Biomind LLC
Director of Research, SIAI
[EMAIL PROTECTED]

Nothing will ever be attempted if all possible objections must be first
overcome  - Dr Samuel Johnson



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Pei Wang
I found the paper.

As I guessed, their update operator is defined on the whole
probability distribution function, rather than on a single probability
value of an event. I don't think it is practical for AGI --- we cannot
afford the time to re-evaluate every belief on each piece of new
evidence. Also, I haven't seen a convincing argument on why an
intelligent system should follow the ME Principle.

Also this paper doesn't directly solve my example, because it doesn't
use second-order probability.

Pei

On Sat, Sep 20, 2008 at 10:13 PM, Ben Goertzel [EMAIL PROTECTED] wrote:

 The approach in that paper doesn't require any special assumptions, and
 could be applied to your example, but I don't have time to write up an
 explanation of how to do the calculations ... you'll have to read the paper
 yourself if you're curious ;-)

 That approach is not implemented in PLN right now but we have debated
 integrating it with PLN as in some ways it's subtler than what we currently
 do in the code...

 ben

 On Sat, Sep 20, 2008 at 10:02 PM, Pei Wang [EMAIL PROTECTED] wrote:

 I didn't know this paper, but I do know approaches based on the
 principle of maximum/optimum entropy. They usually requires much more
 information (or assumptions) than what is given in the following
 example.

 I'd be interested to know what the solution they will suggest for such
 a situation.

 Pei

 On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote:
 
 
 
  Think about a concrete example: if from one source the system gets
  P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
  P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
  conclusion when the two sources are considered together?
 
  There are many approaches to this within the probabilistic framework,
  one of which is contained within this paper, for example...
 
  http://cat.inist.fr/?aModele=afficheNcpsidt=16174172
 
  (I have a copy of the paper but I'm not sure where it's available for
  free online ... if anyone finds it please post the link... thx)
 
  Ben
  
  agi | Archives | Modify Your Subscription


 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription: https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com



 --
 Ben Goertzel, PhD
 CEO, Novamente LLC and Biomind LLC
 Director of Research, SIAI
 [EMAIL PROTECTED]

 Nothing will ever be attempted if all possible objections must be first
 overcome  - Dr Samuel Johnson


 
 agi | Archives | Modify Your Subscription


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Abram Demski
You are right in what you say about (1). The truth is, my analysis is
meant to apply to NARS operating with unrestricted time and memory
resources (which of course is not the point of NARS!). So, the
question is whether NARS approaches a probability calculation as it is
given more time to use all its data.

As for higher values... NARS and PLN may be using them for the purpose
you mention, but that is not the purpose I am giving them in my
analysis! In my analysis, I am simply trying to justify the deductions
allowed in NARS in a probabilistic way. Higher-order probabilities are
potentially useful here because of the way you sum evidence. Simply
put, it is as if NARS purposefully ignores the distinction between
different probability levels, so that a NARS frequency is also a
frequency-of-frequencies and frequency-of-frequency-of frequencies and
so on, all the way up.

The simple way of dealing with this is to say that it is wrong, and
results from a confusion of similar-looking mathematical entities.
But, to some extent, it is intuitive: I should not care too much in
normal reasoning which level of inheritance I'm using when I say
that a truck is a type of vehicle. So the question is, can this be
justified probabilistically? I think I can give a very tentative
yes.

--Abram

On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 This is not a problem the way I set things up. The likelihood of a
 statement is welcome to change over time, as the evidence changes.

 If each of them is changed independently, you don't have a single
 probability distribution anymore, but a bunch of them. In the above
 case, you don't really have P(A--B) and P(B--C), but P_307(A--B)
 and P_409(B--C). How can you use two probability values together if
 they come from different distributions?

 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.

 The same statement holds for PLN, right?

 Yes. Ben proposed a solution, which I won't comment until I see all
 the details in the PLN book.

 The first half is fine, but the second isn't. As the previous example
 shows, in NARS a high Confidence does implies that the Frequency value
 is a good summary of evidence, but a low Confidence does implies that
 the Frequency is bad, just that it is not very stable.

 But I'm not talking about confidence when I say higher. I'm talking
 about the system of levels I defined, for which it is perfectly OK.

 Yes, but the whole purpose of adding another value is to handle
 inconsistency and belief revision. Higher-order probability is
 mathematically sound, but won't do this work.

 Think about a concrete example: if from one source the system gets
 P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source
 P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the
 conclusion when the two sources are considered together?

 Pei


 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription: https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com


Re: [agi] NARS probability

2008-09-20 Thread Pei Wang
On Sat, Sep 20, 2008 at 11:02 PM, Abram Demski [EMAIL PROTECTED] wrote:
 You are right in what you say about (1). The truth is, my analysis is
 meant to apply to NARS operating with unrestricted time and memory
 resources (which of course is not the point of NARS!). So, the
 question is whether NARS approaches a probability calculation as it is
 given more time to use all its data.

That is an interesting question. When the weight of evidence w goes to
infinite, so does confidence, and frequency converge to the limit of
positive evidence among all evidence, so it becomes probability, under
a certain interpretation. Therefore, as far as a single truth value is
concerned, probability theory is an extreme case of NARS.

However, to take all truth values in the system into account, it is
not necessarily true, because the two theories specify the relations
among statements/propositions differently. For example, probability
theory has conditional B|A, while NARS uses implication A==B, which
are similar, but not the same. Of course, there are some overlaps,
such as disjunction and conjunction, where NARS converges to
probability theory in the extreme case (infinite evidence).

 As for higher values... NARS and PLN may be using them for the purpose
 you mention, but that is not the purpose I am giving them in my
 analysis! In my analysis, I am simply trying to justify the deductions
 allowed in NARS in a probabilistic way. Higher-order probabilities are
 potentially useful here because of the way you sum evidence. Simply
 put, it is as if NARS purposefully ignores the distinction between
 different probability levels, so that a NARS frequency is also a
 frequency-of-frequencies and frequency-of-frequency-of frequencies and
 so on, all the way up.

I see what you mean, but as it is currently defined, in NARS there is
no need to introduce higher-order probabilities --- frequency is not
an estimation of a true probability. It is uncertain because the
influence of new evidence, not because it is inaccurate.

 The simple way of dealing with this is to say that it is wrong, and
 results from a confusion of similar-looking mathematical entities.
 But, to some extent, it is intuitive: I should not care too much in
 normal reasoning which level of inheritance I'm using when I say
 that a truck is a type of vehicle. So the question is, can this be
 justified probabilistically? I think I can give a very tentative
 yes.

Hopefully we'll know better about that when you explore further. ;-)

Pei

 --Abram

 On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote:
 On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote:

 (1) In probability theory, an event E has a constant probability P(E)
 (which can be unknown). Given the assumption of insufficient knowledge
 and resources, in NARS P(A--B) would change over time, when more and
 more evidence is taken into account. This process cannot be treated as
 conditioning, because, among other things, the system can neither
 explicitly list all evidence as condition, nor update the probability
 of all statements in the system for each piece of new evidence (so as
 to treat all background knowledge as a default condition).
 Consequently, at any moment P(A--B) and P(B--C) may be based on
 different, though unspecified, data, so it is invalid to use them in a
 rule to calculate the probability of A--C --- probability theory
 does not allow cross-distribution probability calculation.

 This is not a problem the way I set things up. The likelihood of a
 statement is welcome to change over time, as the evidence changes.

 If each of them is changed independently, you don't have a single
 probability distribution anymore, but a bunch of them. In the above
 case, you don't really have P(A--B) and P(B--C), but P_307(A--B)
 and P_409(B--C). How can you use two probability values together if
 they come from different distributions?

 (2) For the same reason, in NARS a statement might get different
 probability attached, when derived from different evidence.
 Probability theory does not have a general rule to handle
 inconsistency within a probability distribution.

 The same statement holds for PLN, right?

 Yes. Ben proposed a solution, which I won't comment until I see all
 the details in the PLN book.

 The first half is fine, but the second isn't. As the previous example
 shows, in NARS a high Confidence does implies that the Frequency value
 is a good summary of evidence, but a low Confidence does implies that
 the Frequency is bad, just that it is not very stable.

 But I'm not talking about confidence when I say higher. I'm talking
 about the system of levels I defined, for which it is perfectly OK.

 Yes, but the whole purpose of adding another value is to handle
 inconsistency and belief revision. Higher-order probability is
 mathematically sound, but won't do this work.

 Think about a concrete example: if from one source the system gets
 P(A--B) = 0.9, and 

Re: [agi] NARS probability

2008-09-20 Thread Ben Goertzel
On Sat, Sep 20, 2008 at 10:32 PM, Pei Wang [EMAIL PROTECTED] wrote:

 I found the paper.

 As I guessed, their update operator is defined on the whole
 probability distribution function, rather than on a single probability
 value of an event. I don't think it is practical for AGI --- we cannot
 afford the time to re-evaluate every belief on each piece of new
 evidence. Also, I haven't seen a convincing argument on why an
 intelligent system should follow the ME Principle.


I agree their method is not practical for most cases in AGI, which is why
we didn't use it within PLN ;-)  ... we use a simpler revision rule...



 Also this paper doesn't directly solve my example, because it doesn't
 use second-order probability.


That is true, but it could be straightforwardly extended to that case...

Ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69
Powered by Listbox: http://www.listbox.com