Hi,

  I noticed that classifier has three functions to call to get the score.
classify - returns probabilities
classifyNoLink - returns the raw score (optional)
classifyScalar - returns the binary probability

I'm working on a few classifiers for which it doesn't make sense to return
probability. In fact, the probability is just the raw score exponentiated.
This would distort the scores a bit, rather than if the user just used the
raw score directly. Also, if they assume that the scores are really
probabilities they may be tempted to use it to compare between two
classifiers without previously calibrating on a test set.

I wonder if we can add classifiyScalarNoLink and make the NoLinks
non-optional. They just return probabilities if you're using a classifier
that returns in the 0-1 range.
This way people  can choose to use either interface primarily, rather than
calling classify and assume all classifiers support probabilities.

Finally, there's some algorithms that can return regression / ranking or
classification scores depending on the training data. I was just planning to
return the same value via classifiyScalarNoLink but it seems to be a poorly
named proposed function. I could just name the function 'score' but it would
break the naming convention already set down.

Thoughts?

-- 
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

Reply via email to