Re: Question on entropy calculation

Ted Dunning Fri, 15 Jul 2011 10:48:29 -0700

On Fri, Jul 15, 2011 at 9:38 AM, Sean Owen <[email protected]> wrote:

> I stumped myself looking at the implementation of
> LogLikelihood.entropy(). This is Shannon entropy right? just the sume
> of -x*log(x) for all x in the input?
>


Sort of.  It would be Shannon entropy if the sum x_i = 1.

>
> I understand why it could be desirable to normalize the input to sum
> to 1, but we don't since it doesn't matter in most contexts. So if N =
> sum(x), the normalized version would be the sum of -x/N * log(x/N).
> Right?
>

Yes.  But it is actually important to not normalize.


> But what it computes now is the sum of -x * log(x/N). Seems like a bit
> of both there. But I do see that the unnormalized result simply scales
> linearly compared to the normalized version as the input values
> increase, which seems good.
>

See here: http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html

And here: http://acl.ldc.upenn.edu/J/J93/J93-1003.pdf

The origin of the expression is the original form of the log-likelihood
ratio statistic.  For multinomials, the likelihood looks like

      p(K | \vec \pi) = Z \prod_i \pi_i^k_i

where Z is a normalizing constant.  The maximum likelihood estimate is

      \pi_i = k_i / N

This means that the maximum log likelihood is

      max_pi p(K | \vec \pi) = \sum k_i \log (k_i / N) + log Z

The log-likelihood ratio involves three such expressions.

The similarity to Shannon entropy here is either very deep or coincidental,
depending on the day of the week.

I haven't encountered this issue before so don't know what the usual
> answer is. There seems to be a different definition of normalized
> entropy floating around from social sciences which makes it worse.
>

What is the issue?

I think that the Javadoc says all of this and that the implementation is
correct as specified.  The Javadocs for the different methods point to web
resources with definitions.

Re: Question on entropy calculation

Reply via email to