Ted, thanks very much.

Thoughts in response to both of your messages:

1: alpha-beta is being used here in the sense of E+M. Or, to be
specific, alpha is the path sum from the beginning to the current
'time', and beta is the path sum from the current 'time' to the end.

2: I had read about that 'at the margin' idea and completely forgotten
it. My starting point here is Miller and Guiness (one of whom used to
work with me and the other of whom still does). They didn't report,
and perhaps didn't measure, whether the examples selected via that
'gamma' calculation had high error rates (far from the margin?) or low
error rates (close to the margin). They just observed that

3: A scatter-plot looks like what the doctor ordered.

4: That paper is new to me. My stack of papers in this neighborhood is
Collins, Miller + Guiness, Crammer (on Passive-Aggressive) and the
Oxford paper on segmentation. Thanks for the pointer.

--benson

On Sat, Feb 13, 2010 at 11:18 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Benson,
>
> Are you using techniques related to this:
> http://www.it.usyd.edu.au/~james/pubs/pdf/dlp07perc.pdf ?
>
>
>
> On Sat, Feb 13, 2010 at 9:38 AM, Benson Margulies 
> <bimargul...@gmail.com>wrote:
>
>> Folks,
>>
>> Here's one of my occasional questions in which I am, in essence,
>> bartering my code wrangling efforts for expertise on hard stuff.
>>
>> Consider a sequence problem addressed with a perceptron model with an
>> ordinary Viterbi decoder. There's a standard confidence estimation
>> technique borrowed from HMMs: calculate gamma = alpha + beta for each
>> state, take the difference of the gammas for the best and second best
>> hypothesis for each column of the trellis, and take argmin of them as
>> the overall confidence of the decode. (+, of course, because in a
>> perceptron we're summing feature weights, not multiplying
>> probabilities.)
>>
>>
>

Reply via email to