Re: Beginner question about entropy calculation

Ted Dunning Thu, 09 Sep 2010 23:15:23 -0700

What difference are you seeing?

The entropy calculation in LogLikelihood is what I would call un-normalized
entropy


   H(K) = - \sum_i k_i log (k_i / n)

This makes the expression for the log-likelihood ratio slightly simpler.
 The log (k_i / n) is also split into
two parts to avoid doing lots of divisions.  This gives:

    H(K) = -\sum_i k_i (log(k_i) - log(n)) = -\sum_i k_i log(k_i) + \sum_i
k_i log(n)
           =  n log(n) - \sum_i k_i log(k_i)



On Thu, Sep 9, 2010 at 10:06 PM, Gangadhar Nittala
<[email protected]>wrote:

> All,
> I am a first time user of Mahout. I checked out the code and was able
> to get the build going. I was checking the Tasks list (I use Eclipse)
> and saw one in the LogLikelihoodTest.java to check the epsilons.
>
> While checking the code in the LogLikelohood.java
> (org.apache.mahout.math.stats.Loglikelihood.java), I saw that the code
> for the Shannon entropy calculation seemed different from the one as
> it is defined on Wikipedia
> [http://en.wikipedia.org/wiki/Entropy_(information_theory)].
>
> I wrote a small python script (attached- llr.py) to compare the one
> that is present in Mahout
> (org.apache.mahout.math.stats.Loglikelihood.java) and the one that is
> defined by Ted Dunning in
> http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html.
> Even though the entropy and LLR calculations are different, the final
> output of the LLR is the same with both the methods.
>
> I am trying to find out why both the methods are equivalent. Can you
> please let me know why this is the case or if there is a reference I
> can check I shall do that. If this is not the list for this question,
> I am sorry, I shall try the mahout-users list.
>
> Thank you
> Gangadhar
>

Re: Beginner question about entropy calculation

Reply via email to