Hi,
I have read papers on multinomial estimation for sparse variables (e.g. "A
Natural Law of Succession" by Eric Sven Ristad), because I need good
probability estimates for calculating the joint entropy of sets of those.
However, I have found that the measures that I have looked at so far
break the chain rule for entropy, with the exception of simple frequency
counting and the Laplace-estimate, both of which are discouraged for
very sparse data.
The entropy of a random variable X having possible classes A_X is
defined as:
H(X) = - \sum_{x \el A_X}{p(X) \times \log(p(x))}
The joint entropy of two random variables X,Y having possible classes
A_X and A_Y is defined as:
H(X,Y) = - \sum_{x,y \el A_X,A_Y}{p(x,y) \times \log(p(x,y))}
and has the property that H(X,Y) = H(Y,X).
The chain rule for entropy states that:
H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y)
When using e.g. Ristad's formula for estimating probabilities, the above
does not hold anymore! Thus, because I use the chain rule to compute
joint entropies, I get inconsistent results depending on the order in
which I apply it to the variables.
Does anybody know of multinomial estimators that preserve the laws of
entropy and give good results on sparse variables?
Best regards,
Markus Mottl
--
Markus Mottl http://www.oefai.at/~markus [EMAIL PROTECTED]
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================