On Fri, May 20, 2011 at 11:21 AM, Hector Yee <[email protected]> wrote:

> I'm working on a few classifiers for which it doesn't make sense to return
> probability. In fact, the probability is just the raw score exponentiated.
> This would distort the scores a bit, rather than if the user just used the
> raw score directly. Also, if they assume that the scores are really
> probabilities they may be tempted to use it to compare between two
> classifiers without previously calibrating on a test set.
>

I am not too worried about the calibration issue since it is reasonable to
handle that with documentation.

Returning raw scores without the exponentiation is a natural think to do
with the noLink form.


>
> I wonder if we can add classifiyScalarNoLink and make the NoLinks
> non-optional. They just return probabilities if you're using a classifier
> that returns in the 0-1 range.
> This way people  can choose to use either interface primarily, rather than
> calling classify and assume all classifiers support probabilities.
>

I can't tell quiet what you are suggesting here.  I think you have the tail
of a good
idea, but I can't see the spots on it yet.

Can you be more concrete about what you are proposing?


>
> Finally, there's some algorithms that can return regression / ranking or
> classification scores depending on the training data. I was just planning
> to
> return the same value via classifiyScalarNoLink but it seems to be a poorly
> named proposed function. I could just name the function 'score' but it
> would
> break the naming convention already set down.
>

score is a reasonable name.  classifiyScalarNoLink is fairly descriptive if
you
know the jargon, but score may be better.  One problem I have is that people
are already using this code in production so name changes are a bit painful.

I do think that returning scores without reducing to the 0..1 range is an
important
operation.

Reply via email to