On Sat, Feb 13, 2010 at 9:38 AM, Benson Margulies <bimargul...@gmail.com>wrote:
> This has been observed by better minds than mine to yield a confidence > estimation that is useful in doing active learning for training data > selection. > I haven't used this technique and have only occasionally been able to use active learning outside of the recommendations world. I am not quite sure, by the way, exactly what you mean by alpha and beta. There is pretty variable notation for the Viterbi algorithm, so this isn't all that surprising. Regardless of that and assuming that this is a good heuristic for active learning, have you applied your scatterplot to these confidence levels versus accuracy? Does that show you the relationship that you desire. I should point out that the goal of active learning is to select training examples that would have a large impact on the current model or space of model hypotheses. As such, it is common, especially with perceptron like algorithms like ANN's and SVM's to select examples that are near the decision margin. This corresponds pretty well to the desired behavior. There is nothing, however, that says that these examples will necessarily be ones for which the model is more or less accurate. These are simply examples whose actual target value have a large potential for influencing the model. My own preference for a case like yours is to scan examples to find one where knowing the true value of the target variable would have the largest impact for an on-line learning algorithm. That may not be feasible, but I would guess that their are on-line approximations for whatever learning algorithm you are using given that you have perceptrons scattered about. To compute sensitivity, I would simply assume that different candidates in the beam are the correct result and see which example has the most action as a result of these assumptions. Your current heuristic seems to be doing this indirectly by including a probability of correctness measure and omitting the gradient term but since you don't seem to have a viable probability of correctness, maybe using the gradient will do. Since gradient of the model parameters relative to a single target value is usually largest at the margin, this may turn out to be nearly the same as your original desired heuristic. Wish I could help more, but I don't have any direct experience with the stuff you are working on!