This problem has been studied fairly extensively.  Perhaps the most
references and best-known work on this is the paper by John Platt:

http://research.microsoft.com/~jplatt/SVMprob.ps.gz

The basic idea of this (and other) works is to fit a density function
to the real-valued SVM output, thereby mapping the arbitrary
real-valued outputs to a normalized probability scale.  In the case of
Platt's work, I believe he is fitting a sigmoidal function to the
real-valued outputs.

A few months ago, I reviewed a paper (I don't remember the details
unfortunately) that tried comparing several different methods for
normalizing the outputs of SVMs into probabilities.  The upshot (and
this agrees with my own empirical experiences as well) are that
extremely simple methods will perform just as well as more
"principles" methods such as Platt's.  An example of such a simple
method is to pick a small eps, say that outputs <= -1 are in class 1
with probability eps, outputs >= 1 are in class 1 with probability
1-eps, and scale linearly in between.  (If your training classes are
not approximately equal in size then I might "fudge" a correction for
this.)

Note that all these methods rely on the assumption that the distance
from the separating hyperplane is the sole predictor of class
membership, which is not necessarily true.

Cheers,

rif

[ comp.ai is moderated.  To submit, just post and be patient, or if ]
[ that fails mail your article to <[EMAIL PROTECTED]>, and ]
[ ask your news administrator to fix the problems with your system. ]
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to