This problem has been studied fairly extensively. Perhaps the most references and best-known work on this is the paper by John Platt:
http://research.microsoft.com/~jplatt/SVMprob.ps.gz The basic idea of this (and other) works is to fit a density function to the real-valued SVM output, thereby mapping the arbitrary real-valued outputs to a normalized probability scale. In the case of Platt's work, I believe he is fitting a sigmoidal function to the real-valued outputs. A few months ago, I reviewed a paper (I don't remember the details unfortunately) that tried comparing several different methods for normalizing the outputs of SVMs into probabilities. The upshot (and this agrees with my own empirical experiences as well) are that extremely simple methods will perform just as well as more "principles" methods such as Platt's. An example of such a simple method is to pick a small eps, say that outputs <= -1 are in class 1 with probability eps, outputs >= 1 are in class 1 with probability 1-eps, and scale linearly in between. (If your training classes are not approximately equal in size then I might "fudge" a correction for this.) Note that all these methods rely on the assumption that the distance from the separating hyperplane is the sole predictor of class membership, which is not necessarily true. Cheers, rif [ comp.ai is moderated. To submit, just post and be patient, or if ] [ that fails mail your article to <[EMAIL PROTECTED]>, and ] [ ask your news administrator to fix the problems with your system. ] . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
