RE: [agi] A possible less ass-backward way of computing naive bayesian conditional probabilities

Ed Porter Mon, 25 Feb 2008 12:09:16 -0800

Vlad,

(1) You are correct that naïve Bayes assumes "not just conditional
independence of Ei on hypothesis, that is P(Ei|Ej,H)=P(Ei|H), but also
mutual independence of Ei, that is P(E1,...,En)=P(E1)*...*P(En).:"  That is
a major limitation, one often not met in reality.

But that does stop people from modeling systems in a simplified manner by
acting as if these limitations were met.   Naïve Bayesian methods are
commonly used.  I have read multiple papers saying that in many cases it
proves surprisingly accurate (considering what a gross hack it is) and, of
course, it greatly simplifies computation.

Some of the most pronounced faults with the naïve Bayes approach can be
compensated by "covering" which is based on the idea that if there are
strong dependencies between subsets of the evidentiary features, they system
will probably have statistics on the frequence of occurrence of such subsets
and the conditional probability of H given them, which can be used instead
of contributions from the elements of the subset using the naïve Bayesian
approach.  In effect, covering treats a subset of features on which it has
sufficient sampling as if were a single feature for purposes of naïve Bayes.

With regard to such covering, the one problem for which I have not seen a
solution, is what you do when you have statistics on different overlapping
subsets that arguably each have dependency among the elements of their set.
This is an important problem, because a real embodied system interacting
with a complex environment with complex sets of sensors and activators will
normally be dealing with a relatively large number of sensed, implied,
and/or imagined features at once.  Naïve Bayes handles, this, but far from
optimally.  Covering helps, and is very useful with subsets of currently
activated features break up into non-overlapping subsets.  But systems may
often have valuable statistics from overlapping sets of features, and as of
yet I know of no tested or theoretically approved method for using such
valuable statistical info.  My hunch is you could hack it by blending the
statistics from such overlapping sets.

If you, or anyone on this list, know of a way of applying covering from
multiple overlapping sets of evidential features, please tell me.

(2) Thanks for confirming that you thought my math was correct (by saying it
was equivalent to the traditional naïve bayes formula)

(3) You point out that my restatement "introduces (n-1)'th power of P(H),
which will have to figure in MAP and looks more cumbersome."  

Why is this more cumbersome than p(E1)p(E2)...p(EN)?  It involves no more
multiplications, and it seems to me that it is actually simpler to compute
because it requires fewer memory access.

With regard to your reference to MAP, there are contexts in which the
ass-backwardness of the traditional naïve Bayes formula is reflects the
problem at hand, and, thus, in that context is not really ass-backward.
When I used "H" as the variable whose conditional probability was to be
calculated, I inadvertently was probably implying that it is a hypothesis
about the process that generated the evidence being used to calculate H's
conditional probability.  If that were the case, computations like maximum
likelihood and MAP would be more relevant.  I should have use another letter
than "H", since when I thought up this version of the naïve Bayes equation I
was interesting in determining the probability of an instance of a concept
generalized from experience ("H" is what I used, but "C" for "concept" would
have probably been more appropriate) given the observation of other concepts
(E1...EN) generalized from experience.  I was thinking more in terms of
lateral inference rather than bottom up, top down inference.  In this
context it seems to me, at least at first blush, that computations of ML and
MAP are less relevant. 

Ed Porter

-----Original Message-----
From: Vladimir Nesov [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 25, 2008 1:30 PM
To: [email protected]
Subject: Re: [agi] A possible less ass-backward way of computing naive
bayesian conditional probabilities

On Mon, Feb 25, 2008 at 8:18 PM, Ed Porter <[EMAIL PROTECTED]> wrote:
>
>  As you all know the Naïve Bayes formula for the conditional probability
of H
>  given evidence E1, E2,...EN is
>
>  p(H|E1,E2,...EN) = p(H)  *  p(E1|H)/p(E1)  *  p(E2|H)/p(E2)  *...*
>  p(EN|H)/p(EN)

Hi Ed,

This variant is more restricted than is necessary in some cases: it
needs not just conditional independence of Ei on hypothesis, that is
P(Ei|Ej,H)=P(Ei|H), but also mutual independence of Ei, that is
P(E1,...,En)=P(E1)*...*P(En).

Using P(Ei,H)=P(Ei|H)*P(H)=P(H|Ei)*P(Ei), or
P(Ei|H)/P(Ei)=P(H|Ei)/P(H) you expressed the formula in terms of
P(H|Ei) and P(H) rather than P(Ei), P(Ei|H) and P(H). It's clearly
equivalent, but it also introduces (n-1)'th power of P(H), which will
have to figure in MAP and looks more cumbersome.

-- 
Vladimir Nesov
[EMAIL PROTECTED]

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?&;
Powered by Listbox: http://www.listbox.com

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

<<attachment: winmail.dat>>

RE: [agi] A possible less ass-backward way of computing naive bayesian conditional probabilities

Reply via email to