All,
I am a first time user of Mahout. I checked out the code and was able
to get the build going. I was checking the Tasks list (I use Eclipse)
and saw one in the LogLikelihoodTest.java to check the epsilons.

While checking the code in the LogLikelohood.java
(org.apache.mahout.math.stats.Loglikelihood.java), I saw that the code
for the Shannon entropy calculation seemed different from the one as
it is defined on Wikipedia
[http://en.wikipedia.org/wiki/Entropy_(information_theory)].

I wrote a small python script (attached- llr.py) to compare the one
that is present in Mahout
(org.apache.mahout.math.stats.Loglikelihood.java) and the one that is
defined by Ted Dunning in
http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html.
Even though the entropy and LLR calculations are different, the final
output of the LLR is the same with both the methods.

I am trying to find out why both the methods are equivalent. Can you
please let me know why this is the case or if there is a reference I
can check I shall do that. If this is not the list for this question,
I am sorry, I shall try the mahout-users list.

Thank you
Gangadhar

Reply via email to