Folks, Here's one of my occasional questions in which I am, in essence, bartering my code wrangling efforts for expertise on hard stuff.
Consider a sequence problem addressed with a perceptron model with an ordinary Viterbi decoder. There's a standard confidence estimation technique borrowed from HMMs: calculate gamma = alpha + beta for each state, take the difference of the gammas for the best and second best hypothesis for each column of the trellis, and take argmin of them as the overall confidence of the decode. (+, of course, because in a perceptron we're summing feature weights, not multiplying probabilities.) This has been observed by better minds than mine to yield a confidence estimation that is useful in doing active learning for training data selection. Now consider a beam decoder (*). Beta is not practically available. We came up with the following idea for a next-best-idea. Whenever we decide to keep something in the beam, estimate confidence as: abs(feature sum of what we're keeping - feature sum of the next one down the stack) / abs(feature sum of what we're keeping) Then take argmin of that across the sentence, and call that a confidence. To get some idea of whether this was valid, I put in the code to run Pearson's product moment correlation coefficient on the accuracy on each sentence versus the confidence. The results are awful. So, if anyone has any ideas as to (a) if there's a better way to evaluate this, or (b) if there's a better way to estimate confidence, I'm all ears. (*) for the curious, we're doing Asian-text segmentation with a beam decoded perceptron model.