On Fri, Jul 15, 2011 at 9:38 AM, Sean Owen <[email protected]> wrote:

> I stumped myself looking at the implementation of
> LogLikelihood.entropy(). This is Shannon entropy right? just the sume
> of -x*log(x) for all x in the input?
>

Sort of.  It would be Shannon entropy if the sum x_i = 1.

>
> I understand why it could be desirable to normalize the input to sum
> to 1, but we don't since it doesn't matter in most contexts. So if N =
> sum(x), the normalized version would be the sum of -x/N * log(x/N).
> Right?
>

Yes.  But it is actually important to not normalize.


> But what it computes now is the sum of -x * log(x/N). Seems like a bit
> of both there. But I do see that the unnormalized result simply scales
> linearly compared to the normalized version as the input values
> increase, which seems good.
>

See here: http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html

And here: http://acl.ldc.upenn.edu/J/J93/J93-1003.pdf

The origin of the expression is the original form of the log-likelihood
ratio statistic.  For multinomials, the likelihood looks like

      p(K | \vec \pi) = Z \prod_i \pi_i^k_i

where Z is a normalizing constant.  The maximum likelihood estimate is

      \pi_i = k_i / N

This means that the maximum log likelihood is

      max_pi p(K | \vec \pi) = \sum k_i \log (k_i / N) + log Z

The log-likelihood ratio involves three such expressions.

The similarity to Shannon entropy here is either very deep or coincidental,
depending on the day of the week.

I haven't encountered this issue before so don't know what the usual
> answer is. There seems to be a different definition of normalized
> entropy floating around from social sciences which makes it worse.
>

What is the issue?

I think that the Javadoc says all of this and that the implementation is
correct as specified.  The Javadocs for the different methods point to web
resources with definitions.

Reply via email to