Confidence estimation in a beam decoder

Benson Margulies Sat, 13 Feb 2010 09:38:34 -0800

Folks,

Here's one of my occasional questions in which I am, in essence,
bartering my code wrangling efforts for expertise on hard stuff.


Consider a sequence problem addressed with a perceptron model with an
ordinary Viterbi decoder. There's a standard confidence estimation
technique borrowed from HMMs: calculate gamma = alpha + beta for each
state, take the difference of the gammas for the best and second best
hypothesis for each column of the trellis, and take argmin of them as
the overall confidence of the decode. (+, of course, because in a
perceptron we're summing feature weights, not multiplying
probabilities.)

This has been observed by better minds than mine to yield a confidence
estimation that is useful in doing active learning for training data
selection.

Now consider a beam decoder (*). Beta is not practically available. We
came up with the following idea for a next-best-idea. Whenever we
decide to keep something in the beam, estimate confidence as:

   abs(feature sum of what we're keeping - feature sum of the next one
down the stack) / abs(feature sum of what we're keeping)

Then take argmin of that across the sentence, and call that a confidence.

To get some idea of whether this was valid, I put in the code to run
Pearson's product moment correlation coefficient on the accuracy on
each sentence versus the confidence. The results are awful.

So, if anyone has any ideas as to (a) if there's a better way to
evaluate this, or (b) if there's a better way to estimate confidence,
I'm all ears.

(*) for the curious, we're doing Asian-text segmentation with a beam
decoded perceptron model.

Confidence estimation in a beam decoder

Reply via email to