I stumped myself looking at the implementation of LogLikelihood.entropy(). This is Shannon entropy right? just the sume of -x*log(x) for all x in the input?
I understand why it could be desirable to normalize the input to sum to 1, but we don't since it doesn't matter in most contexts. So if N = sum(x), the normalized version would be the sum of -x/N * log(x/N). Right? But what it computes now is the sum of -x * log(x/N). Seems like a bit of both there. But I do see that the unnormalized result simply scales linearly compared to the normalized version as the input values increase, which seems good. I haven't encountered this issue before so don't know what the usual answer is. There seems to be a different definition of normalized entropy floating around from social sciences which makes it worse.
