[
https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marc Kaminski updated SPARK-21806:
----------------------------------
Attachment: PRROC_example.jpeg
In another [bugfix|https://github.com/scikit-learn/scikit-learn/pull/7356], the
calculation of the auPRC has been changed to exclude the most left point. The
discussion on the behavior of the y-axis intercept is still open though. They
seem to agree that always defining (0, 1) is wrong.
{quote}What about defining precision at recall = 0, if it doesn't exist, to be
the precision at the minimum recall value?{quote}
This is the behavior I'd expect and is also the behavior of
[PRROC|https://cran.r-project.org/web/packages/PRROC/vignettes/PRROC.pdf], as
you can see in the attached image (made from the data in the example). As I am
just some random Spark user who is struggling to interprete his auPRC results,
I'd suggest working together with the scikit community to implement a
consistent behavior throughout the frameworks. :)
> BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
> ----------------------------------------------------------------------
>
> Key: SPARK-21806
> URL: https://issues.apache.org/jira/browse/SPARK-21806
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 2.2.0
> Reporter: Marc Kaminski
> Priority: Minor
> Attachments: PRROC_example.jpeg
>
>
> I would like to reference to a [discussion in scikit-learn|
> https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior
> is probably based on the scikit implementation.
> Summary:
> Currently, the y-axis intercept of the precision recall curve is set to (0.0,
> 1.0). This behavior is not ideal in certain edge cases (see example below)
> and can also have an impact on cross validation, when optimization metric is
> set to "areaUnderPR".
> Please consider [blucena's
> post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613]
> for possible alternatives.
> Edge case example:
> Consider a bad classifier, that assigns a high probability to all samples. A
> possible output might look like this:
> ||Real label || Score ||
> |1.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 0.95 |
> |0.0 | 0.95 |
> |1.0 | 1.0 |
> This results in the following pr points (first line set by default):
> ||Threshold || Recall ||Precision ||
> |1.0 | 0.0 | 1.0 |
> |0.95| 1.0 | 0.2 |
> |0.0| 1.0 | 0,16 |
> The auPRC would be around 0.6. Classifiers with a more differentiated
> probability assignment will be falsely assumed to perform worse in regard to
> this auPRC.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]