Re: [agi] NARS probability
I got it from an internal source. Pei On Sun, Sep 28, 2008 at 8:24 PM, Brad Paulsen [EMAIL PROTECTED] wrote: Pei, Would you mind sharing the link (that is, if you found it on the Internet)? Thanks, Brad Pei Wang wrote: I found the paper. As I guessed, their update operator is defined on the whole probability distribution function, rather than on a single probability value of an event. I don't think it is practical for AGI --- we cannot afford the time to re-evaluate every belief on each piece of new evidence. Also, I haven't seen a convincing argument on why an intelligent system should follow the ME Principle. Also this paper doesn't directly solve my example, because it doesn't use second-order probability. Pei On Sat, Sep 20, 2008 at 10:13 PM, Ben Goertzel [EMAIL PROTECTED] wrote: The approach in that paper doesn't require any special assumptions, and could be applied to your example, but I don't have time to write up an explanation of how to do the calculations ... you'll have to read the paper yourself if you're curious ;-) That approach is not implemented in PLN right now but we have debated integrating it with PLN as in some ways it's subtler than what we currently do in the code... ben On Sat, Sep 20, 2008 at 10:02 PM, Pei Wang [EMAIL PROTECTED] wrote: I didn't know this paper, but I do know approaches based on the principle of maximum/optimum entropy. They usually requires much more information (or assumptions) than what is given in the following example. I'd be interested to know what the solution they will suggest for such a situation. Pei On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote: Think about a concrete example: if from one source the system gets P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the conclusion when the two sources are considered together? There are many approaches to this within the probabilistic framework, one of which is contained within this paper, for example... http://cat.inist.fr/?aModele=afficheNcpsidt=16174172 (I have a copy of the paper but I'm not sure where it's available for free online ... if anyone finds it please post the link... thx) Ben agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Ben Goertzel, PhD CEO, Novamente LLC and Biomind LLC Director of Research, SIAI [EMAIL PROTECTED] Nothing will ever be attempted if all possible objections must be first overcome - Dr Samuel Johnson agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
Hmm... I didn't mean infinite evidence, only infinite time and space with which to compute the consequences of evidence. But that is interesting too. The higher-order probabilities I'm talking about introducing do not reflect inaccuracy at all. :) This may seem odd, but it seems to me to follow from your development of NARS... so the difficulty for me is to account for why you can exclude it in your system. Of course, this need only arises from interpreting your definitions probabilistically. I think I have come up with a more specific proposal. I will try to write it up properly and see if it works. --Abram On Sat, Sep 20, 2008 at 11:28 PM, Pei Wang [EMAIL PROTECTED] wrote: On Sat, Sep 20, 2008 at 11:02 PM, Abram Demski [EMAIL PROTECTED] wrote: You are right in what you say about (1). The truth is, my analysis is meant to apply to NARS operating with unrestricted time and memory resources (which of course is not the point of NARS!). So, the question is whether NARS approaches a probability calculation as it is given more time to use all its data. That is an interesting question. When the weight of evidence w goes to infinite, so does confidence, and frequency converge to the limit of positive evidence among all evidence, so it becomes probability, under a certain interpretation. Therefore, as far as a single truth value is concerned, probability theory is an extreme case of NARS. However, to take all truth values in the system into account, it is not necessarily true, because the two theories specify the relations among statements/propositions differently. For example, probability theory has conditional B|A, while NARS uses implication A==B, which are similar, but not the same. Of course, there are some overlaps, such as disjunction and conjunction, where NARS converges to probability theory in the extreme case (infinite evidence). As for higher values... NARS and PLN may be using them for the purpose you mention, but that is not the purpose I am giving them in my analysis! In my analysis, I am simply trying to justify the deductions allowed in NARS in a probabilistic way. Higher-order probabilities are potentially useful here because of the way you sum evidence. Simply put, it is as if NARS purposefully ignores the distinction between different probability levels, so that a NARS frequency is also a frequency-of-frequencies and frequency-of-frequency-of frequencies and so on, all the way up. I see what you mean, but as it is currently defined, in NARS there is no need to introduce higher-order probabilities --- frequency is not an estimation of a true probability. It is uncertain because the influence of new evidence, not because it is inaccurate. The simple way of dealing with this is to say that it is wrong, and results from a confusion of similar-looking mathematical entities. But, to some extent, it is intuitive: I should not care too much in normal reasoning which level of inheritance I'm using when I say that a truck is a type of vehicle. So the question is, can this be justified probabilistically? I think I can give a very tentative yes. Hopefully we'll know better about that when you explore further. ;-) Pei --Abram On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote: On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote: (1) In probability theory, an event E has a constant probability P(E) (which can be unknown). Given the assumption of insufficient knowledge and resources, in NARS P(A--B) would change over time, when more and more evidence is taken into account. This process cannot be treated as conditioning, because, among other things, the system can neither explicitly list all evidence as condition, nor update the probability of all statements in the system for each piece of new evidence (so as to treat all background knowledge as a default condition). Consequently, at any moment P(A--B) and P(B--C) may be based on different, though unspecified, data, so it is invalid to use them in a rule to calculate the probability of A--C --- probability theory does not allow cross-distribution probability calculation. This is not a problem the way I set things up. The likelihood of a statement is welcome to change over time, as the evidence changes. If each of them is changed independently, you don't have a single probability distribution anymore, but a bunch of them. In the above case, you don't really have P(A--B) and P(B--C), but P_307(A--B) and P_409(B--C). How can you use two probability values together if they come from different distributions? (2) For the same reason, in NARS a statement might get different probability attached, when derived from different evidence. Probability theory does not have a general rule to handle inconsistency within a probability distribution. The same statement holds for PLN, right? Yes. Ben proposed a solution, which I won't comment until I see all
Re: [agi] NARS probability
When working on your new proposal, remember that in NARS all measurements must be based on what the system has --- limited evidence and resources. I don't allow any objective probability that only exists in a Platonic world or the infinite future. Pei On Sun, Sep 21, 2008 at 1:53 PM, Abram Demski [EMAIL PROTECTED] wrote: Hmm... I didn't mean infinite evidence, only infinite time and space with which to compute the consequences of evidence. But that is interesting too. The higher-order probabilities I'm talking about introducing do not reflect inaccuracy at all. :) This may seem odd, but it seems to me to follow from your development of NARS... so the difficulty for me is to account for why you can exclude it in your system. Of course, this need only arises from interpreting your definitions probabilistically. I think I have come up with a more specific proposal. I will try to write it up properly and see if it works. --Abram On Sat, Sep 20, 2008 at 11:28 PM, Pei Wang [EMAIL PROTECTED] wrote: On Sat, Sep 20, 2008 at 11:02 PM, Abram Demski [EMAIL PROTECTED] wrote: You are right in what you say about (1). The truth is, my analysis is meant to apply to NARS operating with unrestricted time and memory resources (which of course is not the point of NARS!). So, the question is whether NARS approaches a probability calculation as it is given more time to use all its data. That is an interesting question. When the weight of evidence w goes to infinite, so does confidence, and frequency converge to the limit of positive evidence among all evidence, so it becomes probability, under a certain interpretation. Therefore, as far as a single truth value is concerned, probability theory is an extreme case of NARS. However, to take all truth values in the system into account, it is not necessarily true, because the two theories specify the relations among statements/propositions differently. For example, probability theory has conditional B|A, while NARS uses implication A==B, which are similar, but not the same. Of course, there are some overlaps, such as disjunction and conjunction, where NARS converges to probability theory in the extreme case (infinite evidence). As for higher values... NARS and PLN may be using them for the purpose you mention, but that is not the purpose I am giving them in my analysis! In my analysis, I am simply trying to justify the deductions allowed in NARS in a probabilistic way. Higher-order probabilities are potentially useful here because of the way you sum evidence. Simply put, it is as if NARS purposefully ignores the distinction between different probability levels, so that a NARS frequency is also a frequency-of-frequencies and frequency-of-frequency-of frequencies and so on, all the way up. I see what you mean, but as it is currently defined, in NARS there is no need to introduce higher-order probabilities --- frequency is not an estimation of a true probability. It is uncertain because the influence of new evidence, not because it is inaccurate. The simple way of dealing with this is to say that it is wrong, and results from a confusion of similar-looking mathematical entities. But, to some extent, it is intuitive: I should not care too much in normal reasoning which level of inheritance I'm using when I say that a truck is a type of vehicle. So the question is, can this be justified probabilistically? I think I can give a very tentative yes. Hopefully we'll know better about that when you explore further. ;-) Pei --Abram On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote: On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote: (1) In probability theory, an event E has a constant probability P(E) (which can be unknown). Given the assumption of insufficient knowledge and resources, in NARS P(A--B) would change over time, when more and more evidence is taken into account. This process cannot be treated as conditioning, because, among other things, the system can neither explicitly list all evidence as condition, nor update the probability of all statements in the system for each piece of new evidence (so as to treat all background knowledge as a default condition). Consequently, at any moment P(A--B) and P(B--C) may be based on different, though unspecified, data, so it is invalid to use them in a rule to calculate the probability of A--C --- probability theory does not allow cross-distribution probability calculation. This is not a problem the way I set things up. The likelihood of a statement is welcome to change over time, as the evidence changes. If each of them is changed independently, you don't have a single probability distribution anymore, but a bunch of them. In the above case, you don't really have P(A--B) and P(B--C), but P_307(A--B) and P_409(B--C). How can you use two probability values together if they come from different distributions? (2) For the
Re: [agi] NARS probability
--- On Sat, 9/20/08, Pei Wang [EMAIL PROTECTED] wrote: Think about a concrete example: if from one source the system gets P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the conclusion when the two sources are considered together? This is a common problem in text prediction. In general, there is no right answer. You have to determine experimentally what works best. You compute the probability using some method, run it on some test data, and measure the accuracy of your predictions. To give a more concrete example, suppose that A is some context (the last n bytes of text), and B is the event that the next bit is a 1. We get different predictions for different orders (different values of n) which we need to combine. In PAQ1-PAQ3, I count zeros and ones in context A. Call these counts c0 and c1. Then I let P(A--B) = c1/(c0+c1) and let the confidence (what you call P(P(A--B)) be c0+c1. To combine them I add up the c0's and c1's and compute SUM c1 / (SUM c0 + SUM c1). I also discovered experimentally that the prediction is more accurate if the counts are weighted by n^2. For example the order 19 context: the cat caught a mo_ is a better predictor of the next symbol than the order 2 context: mo_ even though the latter has probably collected more statistics, and therefore has a higher confidence. In PAQ4-PAQ6 I adjust the weights dynamically using gradient descent of coding cost in weight space to reduce prediction error. This can be improved further by using multiple weight tables indexed by a low order context. In PAQ7-PAQ8 I dynamically map each bit history (truncated c0,c1 plus the last bit) to a probability p_i using a table that is adjusted to reduce prediction error when the bit is observed. Then the predictions p_i are combined using a neural network: p = squash(SUM w_i stretch(p_i)) where squash(x) = 1/(1 + exp(-x)) bounds the output to (0, 1), and stretch(x) = ln(x / (1 - x)) is the inverse of squash. (This implicitly gives greater confidence to probabilities near 0 or 1). When actual bit b is observed, the weights are adjusted to reduce the prediction error b - p by gradient descent of coding cost in weight space as follows: w_i := w_i + L stretch(p_i) (b - p) where L is the learning rate, typically 0.001 to 0.005. Again, we can improve this by using multiple weight tables indexed by a low order context. Or you can use multiple neural networks indexed by different order contexts and combine them by linear averaging or another neural network. In PAQ9 I use chains of 2 input neural networks, where one input is the previous prediction from the next lower context order and the other input is fixed. The weight table is selected by the bit history in the next higher context. This method is still experimental. It works well for simple n-gram models but worse than PAQ8 when there are large numbers of approximately equally good predictions to combine, such as when semantic (cat ~ mouse) and other contexts are added. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
[agi] NARS probability
It has been mentioned several times on this list that NARS has no proper probabilistic interpretation. But, I think I have found one that works OK. Not perfectly. There are some differences, but the similarity is striking (at least to me). I imagine that what I have come up with is not too different from what Ben Goertzel and Pei Wang have already hashed out in their attempts to reconcile the two, but we'll see. The general idea is to treat NARS as probability plus a good number of regularity assumptions that justify the inference steps of NARS. However, since I make so many assumptions, it is very possible that some of them conflict. This would show that NARS couldn't fit into probability theory after all, but it is still interesting even if that's the case... So, here's an outline. We start with the primitive inheritance relation, A inh B; this could be called definite inheritance, because it means that A inherits all of B's properties, and B inherits all of A's instances. B is a superset of A. The truth value is 1 or 0. Then, we define probabilistic inheritance, which carries a probability that a given property of B will be inherited by A and that a given instance of A will be inherited by B. Probabilistic inheritance behaves somewhat like the full NARS inheritance: if we reason about likelihoods (the probability of the data assuming (A prob_inh B) = x), the math is actually the same EXCEPT we can only use primitive inheritance as evidence, so we can't spread evidence around the network by (1) treating prob_inh with high evidence as if it were primitive inh or (2) attempting to use deduction to accumulate evidence as we might want to, so that evidence for A prob_inh B and evidence for B prob_inh C gets combined to evidence for A prob_inh C. So, we can define a second-order-probabilistic-inheritance prob_inh2 that is for prob_inh what prob_inh is for inh. We can define a third-order over the second-order, a fourth over the second, and so on. In fact, each of these are generalizations: simple inheritance can be seen as a special case of prob_inh (where the probability is 1), prob_inh is a special case of prob_inh2, and so on. This means we can define an infinite-order probabilistic inheritance, prob_inh_inf, which is a generalization of any given level. The truth value of prob_inh_inf will be very complicated (since each prob_inhN has a more complicated truth value than the last, and prob_inh_inf will include the truth values from each level). My proposal is to add 2 regularity assumptions to this structure. First, we assume that the prior over probability values for prob_inh is even. This givens us some permission to act like the probability and the likelihood are the same thing, which brings the math closer to NARS. Second, assume that a high truth value on one level strongly implies a high one on the next value, and similarly that low implies low. They will already weakly imply eachother, but I think the math could be brought closer to NARS with a stronger assumption. I don't have any precise suggestions however. The idea here is to allow evidence that properly should only be counted for prob_inh2 to cound for prob_inh as well, which is the case in NARS. This is point (1) above. More generally, it justifies the NARSian practice of using the simple prob_inh likelihood as if it were a likelihood for prob_inh_inf, so that it recursively acts on other instances of itself rather than only on simple inh. Of course, since I have not given precise definitions, this solution is difficult to evaluate. But, I thought it would be of interest. --Abram Demski --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
Abram, I think the best place to start, in exploring the relation between NARS and probablity theory, is with Definition 3.7 in the paper From Inheritance Relation to Non-Axiomatic Logichttp://www.cogsci.indiana.edu/pub/wang.inheritance_nal.ps [*International Journal of Approximate Reasoninghttp://www.elsevier.com/wps/find/journaldescription.cws_home/505787/description#description *, 11(4), 281-319, 1994] which is downloadable from http://nars.wang.googlepages.com/nars%3Apublication It is instructive to look at specific situations, and see how this definition leads one to model situations differently from the way one traditionally uses probability theory to model such situations. The next place to look, in exploring this relation, is at the semantics that 3.7 implies for the induction and abduction rules. Note that unlike in PLN there are no term (node) probabilities in NARS, so that induction and abduction cannot rely on Bayes rule or any close analogue of it. They must be justified on quite different grounds. If you can formulate a probabilistic justification of NARS induction and abduction truth value formulas, I'll be quite interested. I'm not saying it's impossible, just that it's not obvious ... one has to grapple with 3.7 and the fact that the NARS relative frequency w+/w is combining intension and extension in a manner that is unusual relative to ordinary probabilistic treatments. The math here is simple enough that one does not need to do hand-wavy philosophizing ;-) ... it's just elementary algebra. The subtle part is really the semantics, i.e. the way the math is used to model situations. -- Ben G On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote: It has been mentioned several times on this list that NARS has no proper probabilistic interpretation. But, I think I have found one that works OK. Not perfectly. There are some differences, but the similarity is striking (at least to me). I imagine that what I have come up with is not too different from what Ben Goertzel and Pei Wang have already hashed out in their attempts to reconcile the two, but we'll see. The general idea is to treat NARS as probability plus a good number of regularity assumptions that justify the inference steps of NARS. However, since I make so many assumptions, it is very possible that some of them conflict. This would show that NARS couldn't fit into probability theory after all, but it is still interesting even if that's the case... So, here's an outline. We start with the primitive inheritance relation, A inh B; this could be called definite inheritance, because it means that A inherits all of B's properties, and B inherits all of A's instances. B is a superset of A. The truth value is 1 or 0. Then, we define probabilistic inheritance, which carries a probability that a given property of B will be inherited by A and that a given instance of A will be inherited by B. Probabilistic inheritance behaves somewhat like the full NARS inheritance: if we reason about likelihoods (the probability of the data assuming (A prob_inh B) = x), the math is actually the same EXCEPT we can only use primitive inheritance as evidence, so we can't spread evidence around the network by (1) treating prob_inh with high evidence as if it were primitive inh or (2) attempting to use deduction to accumulate evidence as we might want to, so that evidence for A prob_inh B and evidence for B prob_inh C gets combined to evidence for A prob_inh C. So, we can define a second-order-probabilistic-inheritance prob_inh2 that is for prob_inh what prob_inh is for inh. We can define a third-order over the second-order, a fourth over the second, and so on. In fact, each of these are generalizations: simple inheritance can be seen as a special case of prob_inh (where the probability is 1), prob_inh is a special case of prob_inh2, and so on. This means we can define an infinite-order probabilistic inheritance, prob_inh_inf, which is a generalization of any given level. The truth value of prob_inh_inf will be very complicated (since each prob_inhN has a more complicated truth value than the last, and prob_inh_inf will include the truth values from each level). My proposal is to add 2 regularity assumptions to this structure. First, we assume that the prior over probability values for prob_inh is even. This givens us some permission to act like the probability and the likelihood are the same thing, which brings the math closer to NARS. Second, assume that a high truth value on one level strongly implies a high one on the next value, and similarly that low implies low. They will already weakly imply eachother, but I think the math could be brought closer to NARS with a stronger assumption. I don't have any precise suggestions however. The idea here is to allow evidence that properly should only be counted for prob_inh2 to cound for prob_inh as well, which is the case in NARS.
Re: [agi] NARS probability
Ben, Thanks for the references. I do not have any particularly good reason for trying to do this, but it is a fun exercise and I find myself making the attempt every so often :). I haven't read the PLN book yet (though I downloaded a copy, thanks!), but at present I don't see why term probabilities are needed... unless inheritance relations A inh B are interpreted as conditional probabilities A given B. I am not interpreting them that way-- I am just treating inheritance as a reflexive and transitive relation that (for some reason) we want to reason about probabilistically. As such, it is easy to set up probabilistic treatments-- the challenge is to get them to behave in a way that resembles NARS. Another way of putting this is that I am not worrying too much about the semantics, I'm just trying to get the formal manipulations to match up. And the definition 3.7 that you mentioned *does* match up, perfectly, when the {w+, w} truth-value is interpreted as a way of representing the likelihood density function of the prob_inh. Easy! The challenge is section 4.4 in the paper you reference: syllogisms. The way evidence is spread around there doesn't match with definition 3.7, not without further probabilistic assumptions. --Abram On Sat, Sep 20, 2008 at 4:13 PM, Ben Goertzel [EMAIL PROTECTED] wrote: Abram, I think the best place to start, in exploring the relation between NARS and probablity theory, is with Definition 3.7 in the paper From Inheritance Relation to Non-Axiomatic Logic [International Journal of Approximate Reasoning, 11(4), 281-319, 1994] which is downloadable from http://nars.wang.googlepages.com/nars%3Apublication It is instructive to look at specific situations, and see how this definition leads one to model situations differently from the way one traditionally uses probability theory to model such situations. The next place to look, in exploring this relation, is at the semantics that 3.7 implies for the induction and abduction rules. Note that unlike in PLN there are no term (node) probabilities in NARS, so that induction and abduction cannot rely on Bayes rule or any close analogue of it. They must be justified on quite different grounds. If you can formulate a probabilistic justification of NARS induction and abduction truth value formulas, I'll be quite interested. I'm not saying it's impossible, just that it's not obvious ... one has to grapple with 3.7 and the fact that the NARS relative frequency w+/w is combining intension and extension in a manner that is unusual relative to ordinary probabilistic treatments. The math here is simple enough that one does not need to do hand-wavy philosophizing ;-) ... it's just elementary algebra. The subtle part is really the semantics, i.e. the way the math is used to model situations. -- Ben G On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote: It has been mentioned several times on this list that NARS has no proper probabilistic interpretation. But, I think I have found one that works OK. Not perfectly. There are some differences, but the similarity is striking (at least to me). I imagine that what I have come up with is not too different from what Ben Goertzel and Pei Wang have already hashed out in their attempts to reconcile the two, but we'll see. The general idea is to treat NARS as probability plus a good number of regularity assumptions that justify the inference steps of NARS. However, since I make so many assumptions, it is very possible that some of them conflict. This would show that NARS couldn't fit into probability theory after all, but it is still interesting even if that's the case... So, here's an outline. We start with the primitive inheritance relation, A inh B; this could be called definite inheritance, because it means that A inherits all of B's properties, and B inherits all of A's instances. B is a superset of A. The truth value is 1 or 0. Then, we define probabilistic inheritance, which carries a probability that a given property of B will be inherited by A and that a given instance of A will be inherited by B. Probabilistic inheritance behaves somewhat like the full NARS inheritance: if we reason about likelihoods (the probability of the data assuming (A prob_inh B) = x), the math is actually the same EXCEPT we can only use primitive inheritance as evidence, so we can't spread evidence around the network by (1) treating prob_inh with high evidence as if it were primitive inh or (2) attempting to use deduction to accumulate evidence as we might want to, so that evidence for A prob_inh B and evidence for B prob_inh C gets combined to evidence for A prob_inh C. So, we can define a second-order-probabilistic-inheritance prob_inh2 that is for prob_inh what prob_inh is for inh. We can define a third-order over the second-order, a fourth over the second, and so on. In fact, each of these are generalizations: simple
Re: [agi] NARS probability
I haven't read the PLN book yet (though I downloaded a copy, thanks!), but at present I don't see why term probabilities are needed... unless inheritance relations A inh B are interpreted as conditional probabilities A given B. I am not interpreting them that way-- I am just treating inheritance as a reflexive and transitive relation that (for some reason) we want to reason about probabilistically. Well, one question is whether you want to be able to do inference like A --B tv1 |- B --A tv2 Doing that without term probabilities is pretty hard... Another interesting approach would be to investigate which of Cox's axioms (for probability) are violated in NARS, in what semantic interpretation, and why... ben --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
And the definition 3.7 that you mentioned *does* match up, perfectly, when the {w+, w} truth-value is interpreted as a way of representing the likelihood density function of the prob_inh. Easy! The challenge is section 4.4 in the paper you reference: syllogisms. The way evidence is spread around there doesn't match with definition 3.7, not without further probabilistic assumptions. which seems to be because the semantic interpretation of evidence in 3.7 is different in NARS than in PLN or most probabilistic treatments... this is why I suggested to look at how 3.7 is used to model a real situation, versus how that situation would be modeled in prob. theory... having a good test situation in mind might help to think about the syllogistic rules more clearly it needs to be a situation where the terms and relations are grounded in a system's experience, as that is what NARS and PLN semantics are both all about... ben --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
Well, one question is whether you want to be able to do inference like A --B tv1 |- B --A tv2 Doing that without term probabilities is pretty hard... Not the way I set it up. A--B is not the conditional probability P(B|A), but it *is* a conditional probability, so the normal Bayesian rules apply. --Abram --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote: It has been mentioned several times on this list that NARS has no proper probabilistic interpretation. But, I think I have found one that works OK. Not perfectly. There are some differences, but the similarity is striking (at least to me). Abram, There is indeed a lot of similarity between NARS and probability theory. When I started this project, my plan was to use probability theory to handle uncertainty. I moved away from it after I believed that what is needed cannot be fully obtained from that theory and its extensions. Even so, NARS still agrees with probability theory here or there, which were mentioned in my papers. The key, therefore, is whether NARS can be FULLY treated as an application of probability theory, by following the probability axioms, and only adding justifiable consistent assumptions when necessary. I imagine that what I have come up with is not too different from what Ben Goertzel and Pei Wang have already hashed out in their attempts to reconcile the two, but we'll see. The general idea is to treat NARS as probability plus a good number of regularity assumptions that justify the inference steps of NARS. However, since I make so many assumptions, it is very possible that some of them conflict. This would show that NARS couldn't fit into probability theory after all, but it is still interesting even if that's the case... I assume by treat NARS as probability you mean to treat the Frequency in NARS as a measurement following the axioms of probability theory. I mentioned this because there is another measurement in NARS, Expectation (which is derived from Frequency and Confidence), which is also intuitively similar to probability. So, here's an outline. We start with the primitive inheritance relation, A inh B; this could be called definite inheritance, because it means that A inherits all of B's properties, and B inherits all of A's instances. B is a superset of A. The truth value is 1 or 0. Fine. Then, we define probabilistic inheritance, which carries a probability that a given property of B will be inherited by A and that a given instance of A will be inherited by B. There is a tricky issue here. When evaluating the truth value of A--B, NARS doesn't only check properties and instances, but also check supersets and subsets, intuitively speaking. For example, when the system is told that Swans are birds and Swans fly, it derives Birds fly by induction. In this process swan is counted as one piece of evidence, rather than a set of instances. How many swans the system knows doesn't matter in this step. That is why in the definitions I use extension/intension, not instance/property, because the latter is just special cases of the former. Actually, the truth value of A--B measures how often the two terms can substitute each other (in different ways), not how much one set is included in the other, which is the usual probabilistic reading of an inheritance. This is one reason why NARS does not define node probability. Probabilistic inheritance behaves somewhat like the full NARS inheritance: if we reason about likelihoods (the probability of the data assuming (A prob_inh B) = x), the math is actually the same EXCEPT we can only use primitive inheritance as evidence, so we can't spread evidence around the network by (1) treating prob_inh with high evidence as if it were primitive inh or (2) attempting to use deduction to accumulate evidence as we might want to, so that evidence for A prob_inh B and evidence for B prob_inh C gets combined to evidence for A prob_inh C. Beside the problem you mentioned, there are other issues. Let me start at the basic ones: (1) In probability theory, an event E has a constant probability P(E) (which can be unknown). Given the assumption of insufficient knowledge and resources, in NARS P(A--B) would change over time, when more and more evidence is taken into account. This process cannot be treated as conditioning, because, among other things, the system can neither explicitly list all evidence as condition, nor update the probability of all statements in the system for each piece of new evidence (so as to treat all background knowledge as a default condition). Consequently, at any moment P(A--B) and P(B--C) may be based on different, though unspecified, data, so it is invalid to use them in a rule to calculate the probability of A--C --- probability theory does not allow cross-distribution probability calculation. (2) For the same reason, in NARS a statement might get different probability attached, when derived from different evidence. Probability theory does not have a general rule to handle inconsistency within a probability distribution. So, we can define a second-order-probabilistic-inheritance prob_inh2 that is for prob_inh what prob_inh is for inh. We can define a third-order over the second-order, a fourth over the second, and so on.
Re: [agi] NARS probability
Beside the problem you mentioned, there are other issues. Let me start at the basic ones: (1) In probability theory, an event E has a constant probability P(E) (which can be unknown). Given the assumption of insufficient knowledge and resources, in NARS P(A--B) would change over time, when more and more evidence is taken into account. This process cannot be treated as conditioning, because, among other things, the system can neither explicitly list all evidence as condition, nor update the probability of all statements in the system for each piece of new evidence (so as to treat all background knowledge as a default condition). Consequently, at any moment P(A--B) and P(B--C) may be based on different, though unspecified, data, so it is invalid to use them in a rule to calculate the probability of A--C --- probability theory does not allow cross-distribution probability calculation. (2) For the same reason, in NARS a statement might get different probability attached, when derived from different evidence. Probability theory does not have a general rule to handle inconsistency within a probability distribution. Of course, these issues can be handled in probability theory via introducing higher-order probabilities ... ben --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
Thanks for the critique. Replies follow... On Sat, Sep 20, 2008 at 8:20 PM, Pei Wang [EMAIL PROTECTED] wrote: On Sat, Sep 20, 2008 at 2:22 PM, Abram Demski [EMAIL PROTECTED] wrote: [...] The key, therefore, is whether NARS can be FULLY treated as an application of probability theory, by following the probability axioms, and only adding justifiable consistent assumptions when necessary. Yes, that's the main question. Also, though, if the answer is no it is potentially important to figure out why. [...] I assume by treat NARS as probability you mean to treat the Frequency in NARS as a measurement following the axioms of probability theory. I mentioned this because there is another measurement in NARS, Expectation (which is derived from Frequency and Confidence), which is also intuitively similar to probability. Yes, you are right... at least so far, I've only been looking at frequency + confidence. Getting expectation from that does not look like it violates any laws. [...] Then, we define probabilistic inheritance, which carries a probability that a given property of B will be inherited by A and that a given instance of A will be inherited by B. There is a tricky issue here. When evaluating the truth value of A--B, NARS doesn't only check properties and instances, but also check supersets and subsets, intuitively speaking. For example, when the system is told that Swans are birds and Swans fly, it derives Birds fly by induction. In this process swan is counted as one piece of evidence, rather than a set of instances. How many swans the system knows doesn't matter in this step. That is why in the definitions I use extension/intension, not instance/property, because the latter is just special cases of the former. Actually, the truth value of A--B measures how often the two terms can substitute each other (in different ways), not how much one set is included in the other, which is the usual probabilistic reading of an inheritance. This is one reason why NARS does not define node probability. Yes, I understand this. I should have worded myself more carefully. Probabilistic inheritance behaves somewhat like the full NARS inheritance: if we reason about likelihoods (the probability of the data assuming (A prob_inh B) = x), the math is actually the same EXCEPT we can only use primitive inheritance as evidence, so we can't spread evidence around the network by (1) treating prob_inh with high evidence as if it were primitive inh or (2) attempting to use deduction to accumulate evidence as we might want to, so that evidence for A prob_inh B and evidence for B prob_inh C gets combined to evidence for A prob_inh C. Beside the problem you mentioned, there are other issues. Let me start at the basic ones: (1) In probability theory, an event E has a constant probability P(E) (which can be unknown). Given the assumption of insufficient knowledge and resources, in NARS P(A--B) would change over time, when more and more evidence is taken into account. This process cannot be treated as conditioning, because, among other things, the system can neither explicitly list all evidence as condition, nor update the probability of all statements in the system for each piece of new evidence (so as to treat all background knowledge as a default condition). Consequently, at any moment P(A--B) and P(B--C) may be based on different, though unspecified, data, so it is invalid to use them in a rule to calculate the probability of A--C --- probability theory does not allow cross-distribution probability calculation. This is not a problem the way I set things up. The likelihood of a statement is welcome to change over time, as the evidence changes. (2) For the same reason, in NARS a statement might get different probability attached, when derived from different evidence. Probability theory does not have a general rule to handle inconsistency within a probability distribution. The same statement holds for PLN, right? [...] My proposal is to add 2 regularity assumptions to this structure. First, we assume that the prior over probability values for prob_inh is even. This givens us some permission to act like the probability and the likelihood are the same thing, which brings the math closer to NARS. That is intuitively acceptable, if interpreted properly. Second, assume that a high truth value on one level strongly implies a high one on the next value, and similarly that low implies low. The first half is fine, but the second isn't. As the previous example shows, in NARS a high Confidence does implies that the Frequency value is a good summary of evidence, but a low Confidence does implies that the Frequency is bad, just that it is not very stable. But I'm not talking about confidence when I say higher. I'm talking about the system of levels I defined, for which it is perfectly OK. Essentially what I'm claiming here is that the inferences of NARS are
Re: [agi] NARS probability
(2) For the same reason, in NARS a statement might get different probability attached, when derived from different evidence. Probability theory does not have a general rule to handle inconsistency within a probability distribution. The same statement holds for PLN, right? PLN handles inconsistency within probability distributions using higher-order probabilities... both explicitly and, more simply, by allowing multiple inconsistent estimates of the same distribution to exist attached to the same node or link... If you work out a detailed solution along your path, you will see that it will be similar to NARS when both are doing deduction with strong evidence. The difference will show up (1) in cases where evidence is rare, and (2) in non-deductive inferences, such as induction and abduction. I believe this is also where NARS and PLN differ most. Guilty as charged! I have only tried to justify the deduction rule, not any of the others. I seriously didn't think about the blind spot until you mentioned it. I'll have to go back and take a closer look... NARS deduction rule closely approximates the PLN deduction rule for the case where all the premise terms have roughly the same node probability. It particularly closely approximates the concept geometry based variant of the PLN deduction rule, which is interesting: it means NARS deduction approximates the PLN deduction rule variant one gets if one assumes concepts are approximately spherically-shaped rather than being random sets. NARS induction and abduction rules to not closely approximate the PLN induction and abduction rules... -- Ben G --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote: (1) In probability theory, an event E has a constant probability P(E) (which can be unknown). Given the assumption of insufficient knowledge and resources, in NARS P(A--B) would change over time, when more and more evidence is taken into account. This process cannot be treated as conditioning, because, among other things, the system can neither explicitly list all evidence as condition, nor update the probability of all statements in the system for each piece of new evidence (so as to treat all background knowledge as a default condition). Consequently, at any moment P(A--B) and P(B--C) may be based on different, though unspecified, data, so it is invalid to use them in a rule to calculate the probability of A--C --- probability theory does not allow cross-distribution probability calculation. This is not a problem the way I set things up. The likelihood of a statement is welcome to change over time, as the evidence changes. If each of them is changed independently, you don't have a single probability distribution anymore, but a bunch of them. In the above case, you don't really have P(A--B) and P(B--C), but P_307(A--B) and P_409(B--C). How can you use two probability values together if they come from different distributions? (2) For the same reason, in NARS a statement might get different probability attached, when derived from different evidence. Probability theory does not have a general rule to handle inconsistency within a probability distribution. The same statement holds for PLN, right? Yes. Ben proposed a solution, which I won't comment until I see all the details in the PLN book. The first half is fine, but the second isn't. As the previous example shows, in NARS a high Confidence does implies that the Frequency value is a good summary of evidence, but a low Confidence does implies that the Frequency is bad, just that it is not very stable. But I'm not talking about confidence when I say higher. I'm talking about the system of levels I defined, for which it is perfectly OK. Yes, but the whole purpose of adding another value is to handle inconsistency and belief revision. Higher-order probability is mathematically sound, but won't do this work. Think about a concrete example: if from one source the system gets P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the conclusion when the two sources are considered together? Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
Think about a concrete example: if from one source the system gets P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the conclusion when the two sources are considered together? There are many approaches to this within the probabilistic framework, one of which is contained within this paper, for example... http://cat.inist.fr/?aModele=afficheNcpsidt=16174172 (I have a copy of the paper but I'm not sure where it's available for free online ... if anyone finds it please post the link... thx) Ben --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
I didn't know this paper, but I do know approaches based on the principle of maximum/optimum entropy. They usually requires much more information (or assumptions) than what is given in the following example. I'd be interested to know what the solution they will suggest for such a situation. Pei On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote: Think about a concrete example: if from one source the system gets P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the conclusion when the two sources are considered together? There are many approaches to this within the probabilistic framework, one of which is contained within this paper, for example... http://cat.inist.fr/?aModele=afficheNcpsidt=16174172 (I have a copy of the paper but I'm not sure where it's available for free online ... if anyone finds it please post the link... thx) Ben agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
The approach in that paper doesn't require any special assumptions, and could be applied to your example, but I don't have time to write up an explanation of how to do the calculations ... you'll have to read the paper yourself if you're curious ;-) That approach is not implemented in PLN right now but we have debated integrating it with PLN as in some ways it's subtler than what we currently do in the code... ben On Sat, Sep 20, 2008 at 10:02 PM, Pei Wang [EMAIL PROTECTED] wrote: I didn't know this paper, but I do know approaches based on the principle of maximum/optimum entropy. They usually requires much more information (or assumptions) than what is given in the following example. I'd be interested to know what the solution they will suggest for such a situation. Pei On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote: Think about a concrete example: if from one source the system gets P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the conclusion when the two sources are considered together? There are many approaches to this within the probabilistic framework, one of which is contained within this paper, for example... http://cat.inist.fr/?aModele=afficheNcpsidt=16174172 (I have a copy of the paper but I'm not sure where it's available for free online ... if anyone finds it please post the link... thx) Ben agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Ben Goertzel, PhD CEO, Novamente LLC and Biomind LLC Director of Research, SIAI [EMAIL PROTECTED] Nothing will ever be attempted if all possible objections must be first overcome - Dr Samuel Johnson --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
I found the paper. As I guessed, their update operator is defined on the whole probability distribution function, rather than on a single probability value of an event. I don't think it is practical for AGI --- we cannot afford the time to re-evaluate every belief on each piece of new evidence. Also, I haven't seen a convincing argument on why an intelligent system should follow the ME Principle. Also this paper doesn't directly solve my example, because it doesn't use second-order probability. Pei On Sat, Sep 20, 2008 at 10:13 PM, Ben Goertzel [EMAIL PROTECTED] wrote: The approach in that paper doesn't require any special assumptions, and could be applied to your example, but I don't have time to write up an explanation of how to do the calculations ... you'll have to read the paper yourself if you're curious ;-) That approach is not implemented in PLN right now but we have debated integrating it with PLN as in some ways it's subtler than what we currently do in the code... ben On Sat, Sep 20, 2008 at 10:02 PM, Pei Wang [EMAIL PROTECTED] wrote: I didn't know this paper, but I do know approaches based on the principle of maximum/optimum entropy. They usually requires much more information (or assumptions) than what is given in the following example. I'd be interested to know what the solution they will suggest for such a situation. Pei On Sat, Sep 20, 2008 at 9:53 PM, Ben Goertzel [EMAIL PROTECTED] wrote: Think about a concrete example: if from one source the system gets P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the conclusion when the two sources are considered together? There are many approaches to this within the probabilistic framework, one of which is contained within this paper, for example... http://cat.inist.fr/?aModele=afficheNcpsidt=16174172 (I have a copy of the paper but I'm not sure where it's available for free online ... if anyone finds it please post the link... thx) Ben agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Ben Goertzel, PhD CEO, Novamente LLC and Biomind LLC Director of Research, SIAI [EMAIL PROTECTED] Nothing will ever be attempted if all possible objections must be first overcome - Dr Samuel Johnson agi | Archives | Modify Your Subscription --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
You are right in what you say about (1). The truth is, my analysis is meant to apply to NARS operating with unrestricted time and memory resources (which of course is not the point of NARS!). So, the question is whether NARS approaches a probability calculation as it is given more time to use all its data. As for higher values... NARS and PLN may be using them for the purpose you mention, but that is not the purpose I am giving them in my analysis! In my analysis, I am simply trying to justify the deductions allowed in NARS in a probabilistic way. Higher-order probabilities are potentially useful here because of the way you sum evidence. Simply put, it is as if NARS purposefully ignores the distinction between different probability levels, so that a NARS frequency is also a frequency-of-frequencies and frequency-of-frequency-of frequencies and so on, all the way up. The simple way of dealing with this is to say that it is wrong, and results from a confusion of similar-looking mathematical entities. But, to some extent, it is intuitive: I should not care too much in normal reasoning which level of inheritance I'm using when I say that a truck is a type of vehicle. So the question is, can this be justified probabilistically? I think I can give a very tentative yes. --Abram On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote: On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote: (1) In probability theory, an event E has a constant probability P(E) (which can be unknown). Given the assumption of insufficient knowledge and resources, in NARS P(A--B) would change over time, when more and more evidence is taken into account. This process cannot be treated as conditioning, because, among other things, the system can neither explicitly list all evidence as condition, nor update the probability of all statements in the system for each piece of new evidence (so as to treat all background knowledge as a default condition). Consequently, at any moment P(A--B) and P(B--C) may be based on different, though unspecified, data, so it is invalid to use them in a rule to calculate the probability of A--C --- probability theory does not allow cross-distribution probability calculation. This is not a problem the way I set things up. The likelihood of a statement is welcome to change over time, as the evidence changes. If each of them is changed independently, you don't have a single probability distribution anymore, but a bunch of them. In the above case, you don't really have P(A--B) and P(B--C), but P_307(A--B) and P_409(B--C). How can you use two probability values together if they come from different distributions? (2) For the same reason, in NARS a statement might get different probability attached, when derived from different evidence. Probability theory does not have a general rule to handle inconsistency within a probability distribution. The same statement holds for PLN, right? Yes. Ben proposed a solution, which I won't comment until I see all the details in the PLN book. The first half is fine, but the second isn't. As the previous example shows, in NARS a high Confidence does implies that the Frequency value is a good summary of evidence, but a low Confidence does implies that the Frequency is bad, just that it is not very stable. But I'm not talking about confidence when I say higher. I'm talking about the system of levels I defined, for which it is perfectly OK. Yes, but the whole purpose of adding another value is to handle inconsistency and belief revision. Higher-order probability is mathematically sound, but won't do this work. Think about a concrete example: if from one source the system gets P(A--B) = 0.9, and P(P(A--B) = 0.9) = 0.5, while from another source P(A--B) = 0.2, and P(P(A--B) = 0.2) = 0.7, then what will be the conclusion when the two sources are considered together? Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com
Re: [agi] NARS probability
On Sat, Sep 20, 2008 at 11:02 PM, Abram Demski [EMAIL PROTECTED] wrote: You are right in what you say about (1). The truth is, my analysis is meant to apply to NARS operating with unrestricted time and memory resources (which of course is not the point of NARS!). So, the question is whether NARS approaches a probability calculation as it is given more time to use all its data. That is an interesting question. When the weight of evidence w goes to infinite, so does confidence, and frequency converge to the limit of positive evidence among all evidence, so it becomes probability, under a certain interpretation. Therefore, as far as a single truth value is concerned, probability theory is an extreme case of NARS. However, to take all truth values in the system into account, it is not necessarily true, because the two theories specify the relations among statements/propositions differently. For example, probability theory has conditional B|A, while NARS uses implication A==B, which are similar, but not the same. Of course, there are some overlaps, such as disjunction and conjunction, where NARS converges to probability theory in the extreme case (infinite evidence). As for higher values... NARS and PLN may be using them for the purpose you mention, but that is not the purpose I am giving them in my analysis! In my analysis, I am simply trying to justify the deductions allowed in NARS in a probabilistic way. Higher-order probabilities are potentially useful here because of the way you sum evidence. Simply put, it is as if NARS purposefully ignores the distinction between different probability levels, so that a NARS frequency is also a frequency-of-frequencies and frequency-of-frequency-of frequencies and so on, all the way up. I see what you mean, but as it is currently defined, in NARS there is no need to introduce higher-order probabilities --- frequency is not an estimation of a true probability. It is uncertain because the influence of new evidence, not because it is inaccurate. The simple way of dealing with this is to say that it is wrong, and results from a confusion of similar-looking mathematical entities. But, to some extent, it is intuitive: I should not care too much in normal reasoning which level of inheritance I'm using when I say that a truck is a type of vehicle. So the question is, can this be justified probabilistically? I think I can give a very tentative yes. Hopefully we'll know better about that when you explore further. ;-) Pei --Abram On Sat, Sep 20, 2008 at 9:38 PM, Pei Wang [EMAIL PROTECTED] wrote: On Sat, Sep 20, 2008 at 9:09 PM, Abram Demski [EMAIL PROTECTED] wrote: (1) In probability theory, an event E has a constant probability P(E) (which can be unknown). Given the assumption of insufficient knowledge and resources, in NARS P(A--B) would change over time, when more and more evidence is taken into account. This process cannot be treated as conditioning, because, among other things, the system can neither explicitly list all evidence as condition, nor update the probability of all statements in the system for each piece of new evidence (so as to treat all background knowledge as a default condition). Consequently, at any moment P(A--B) and P(B--C) may be based on different, though unspecified, data, so it is invalid to use them in a rule to calculate the probability of A--C --- probability theory does not allow cross-distribution probability calculation. This is not a problem the way I set things up. The likelihood of a statement is welcome to change over time, as the evidence changes. If each of them is changed independently, you don't have a single probability distribution anymore, but a bunch of them. In the above case, you don't really have P(A--B) and P(B--C), but P_307(A--B) and P_409(B--C). How can you use two probability values together if they come from different distributions? (2) For the same reason, in NARS a statement might get different probability attached, when derived from different evidence. Probability theory does not have a general rule to handle inconsistency within a probability distribution. The same statement holds for PLN, right? Yes. Ben proposed a solution, which I won't comment until I see all the details in the PLN book. The first half is fine, but the second isn't. As the previous example shows, in NARS a high Confidence does implies that the Frequency value is a good summary of evidence, but a low Confidence does implies that the Frequency is bad, just that it is not very stable. But I'm not talking about confidence when I say higher. I'm talking about the system of levels I defined, for which it is perfectly OK. Yes, but the whole purpose of adding another value is to handle inconsistency and belief revision. Higher-order probability is mathematically sound, but won't do this work. Think about a concrete example: if from one source the system gets P(A--B) = 0.9, and
Re: [agi] NARS probability
On Sat, Sep 20, 2008 at 10:32 PM, Pei Wang [EMAIL PROTECTED] wrote: I found the paper. As I guessed, their update operator is defined on the whole probability distribution function, rather than on a single probability value of an event. I don't think it is practical for AGI --- we cannot afford the time to re-evaluate every belief on each piece of new evidence. Also, I haven't seen a convincing argument on why an intelligent system should follow the ME Principle. I agree their method is not practical for most cases in AGI, which is why we didn't use it within PLN ;-) ... we use a simpler revision rule... Also this paper doesn't directly solve my example, because it doesn't use second-order probability. That is true, but it could be straightforwardly extended to that case... Ben --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=114414975-3c8e69 Powered by Listbox: http://www.listbox.com