Re: Confidence estimation in a beam decoder

Ted Dunning Sat, 13 Feb 2010 20:11:27 -0800

On Sat, Feb 13, 2010 at 9:38 AM, Benson Margulies <bimargul...@gmail.com>wrote:


> This has been observed by better minds than mine to yield a confidence
> estimation that is useful in doing active learning for training data
> selection.
>

I haven't used this technique and have only occasionally been able to use
active learning outside of the recommendations world.  I am not quite sure,
by the way, exactly what you mean by alpha and beta.  There is pretty
variable notation for the Viterbi algorithm, so this isn't all that
surprising.

Regardless of that and assuming that this is a good heuristic for active
learning, have you applied your scatterplot to these confidence levels
versus accuracy?  Does that show you the relationship that you desire.

I should point out that the goal of active learning is to select training
examples that would have a large impact on the current model or space of
model hypotheses.  As such, it is common, especially with perceptron like
algorithms like ANN's and SVM's to select examples that are near the
decision margin.  This corresponds pretty well to the desired behavior.

There is nothing, however, that says that these examples will necessarily be
ones for which the model is more or less accurate. These are simply examples
whose actual target value have a large potential for influencing the model.


My own preference for a case like yours is to scan examples to find one
where knowing the true value of the target variable would have the largest
impact for an on-line learning algorithm.  That may not be feasible, but I
would guess that their are on-line approximations for whatever learning
algorithm you are using given that you have perceptrons scattered about.  To
compute sensitivity, I would simply assume that different candidates in the
beam are the correct result and see which example has the most action as a
result of these assumptions.  Your current heuristic seems to be doing this
indirectly by including a probability of correctness measure and omitting
the gradient term but since you don't seem to have a viable probability of
correctness, maybe using the gradient will do.  Since gradient of the model
parameters relative to a single target value is usually largest at the
margin, this may turn out to be nearly the same as your original desired
heuristic.


Wish I could help more, but I don't have any direct experience with the
stuff you are working on!

Re: Confidence estimation in a beam decoder

Reply via email to