Hi, I noticed that classifier has three functions to call to get the score. classify - returns probabilities classifyNoLink - returns the raw score (optional) classifyScalar - returns the binary probability
I'm working on a few classifiers for which it doesn't make sense to return probability. In fact, the probability is just the raw score exponentiated. This would distort the scores a bit, rather than if the user just used the raw score directly. Also, if they assume that the scores are really probabilities they may be tempted to use it to compare between two classifiers without previously calibrating on a test set. I wonder if we can add classifiyScalarNoLink and make the NoLinks non-optional. They just return probabilities if you're using a classifier that returns in the 0-1 range. This way people can choose to use either interface primarily, rather than calling classify and assume all classifiers support probabilities. Finally, there's some algorithms that can return regression / ranking or classification scores depending on the training data. I was just planning to return the same value via classifiyScalarNoLink but it seems to be a poorly named proposed function. I could just name the function 'score' but it would break the naming convention already set down. Thoughts? -- Yee Yang Li Hector http://hectorgon.blogspot.com/ (tech + travel) http://hectorgon.com (book reviews)
