On Fri, Jul 15, 2011 at 9:38 AM, Sean Owen <[email protected]> wrote: > I stumped myself looking at the implementation of > LogLikelihood.entropy(). This is Shannon entropy right? just the sume > of -x*log(x) for all x in the input? >
Sort of. It would be Shannon entropy if the sum x_i = 1. > > I understand why it could be desirable to normalize the input to sum > to 1, but we don't since it doesn't matter in most contexts. So if N = > sum(x), the normalized version would be the sum of -x/N * log(x/N). > Right? > Yes. But it is actually important to not normalize. > But what it computes now is the sum of -x * log(x/N). Seems like a bit > of both there. But I do see that the unnormalized result simply scales > linearly compared to the normalized version as the input values > increase, which seems good. > See here: http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html And here: http://acl.ldc.upenn.edu/J/J93/J93-1003.pdf The origin of the expression is the original form of the log-likelihood ratio statistic. For multinomials, the likelihood looks like p(K | \vec \pi) = Z \prod_i \pi_i^k_i where Z is a normalizing constant. The maximum likelihood estimate is \pi_i = k_i / N This means that the maximum log likelihood is max_pi p(K | \vec \pi) = \sum k_i \log (k_i / N) + log Z The log-likelihood ratio involves three such expressions. The similarity to Shannon entropy here is either very deep or coincidental, depending on the day of the week. I haven't encountered this issue before so don't know what the usual > answer is. There seems to be a different definition of normalized > entropy floating around from social sciences which makes it worse. > What is the issue? I think that the Javadoc says all of this and that the implementation is correct as specified. The Javadocs for the different methods point to web resources with definitions.
