[ 
https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Kaminski updated SPARK-21806:
----------------------------------
    Attachment: PRROC_example.jpeg

In another [bugfix|https://github.com/scikit-learn/scikit-learn/pull/7356], the 
calculation of the auPRC has been changed to exclude the most left point. The 
discussion on the behavior of the y-axis intercept is still open though. They 
seem to agree that always defining (0, 1) is wrong. 

{quote}What about defining precision at recall = 0, if it doesn't exist, to be 
the precision at the minimum recall value?{quote}

This is the behavior I'd expect and is also the behavior of 
[PRROC|https://cran.r-project.org/web/packages/PRROC/vignettes/PRROC.pdf], as 
you can see in the attached image (made from the data in the example). As I am 
just some random Spark user who is struggling to interprete his auPRC results, 
I'd suggest working together with the scikit community to implement a 
consistent behavior throughout the frameworks. :)

> BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
> ----------------------------------------------------------------------
>
>                 Key: SPARK-21806
>                 URL: https://issues.apache.org/jira/browse/SPARK-21806
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 2.2.0
>            Reporter: Marc Kaminski
>            Priority: Minor
>         Attachments: PRROC_example.jpeg
>
>
> I would like to reference to a [discussion in scikit-learn| 
> https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior 
> is probably based on the scikit implementation. 
> Summary: 
> Currently, the y-axis intercept of the precision recall curve is set to (0.0, 
> 1.0). This behavior is not ideal in certain edge cases (see example below) 
> and can also have an impact on cross validation, when optimization metric is 
> set to "areaUnderPR". 
> Please consider [blucena's 
> post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613]
>  for possible alternatives. 
> Edge case example: 
> Consider a bad classifier, that assigns a high probability to all samples. A 
> possible output might look like this: 
> ||Real label || Score ||
> |1.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 0.95 |
> |0.0 | 0.95 |
> |1.0 | 1.0 |
> This results in the following pr points (first line set by default): 
> ||Threshold || Recall ||Precision ||
> |1.0 | 0.0 | 1.0 | 
> |0.95| 1.0 | 0.2 |
> |0.0| 1.0 | 0,16 |
> The auPRC would be around 0.6. Classifiers with a more differentiated 
> probability assignment  will be falsely assumed to perform worse in regard to 
> this auPRC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to