[
https://issues.apache.org/jira/browse/SPARK-32472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846644#comment-17846644
]
Gideon P commented on SPARK-32472:
----------------------------------
[~kmoore] can I raise a PR for this issue?
> Expose confusion matrix elements by threshold in BinaryClassificationMetrics
> ----------------------------------------------------------------------------
>
> Key: SPARK-32472
> URL: https://issues.apache.org/jira/browse/SPARK-32472
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 3.0.0
> Reporter: Kevin Moore
> Priority: Minor
>
> Currently, the only thresholded metrics available from
> BinaryClassificationMetrics are precision, recall, f-measure, and (indirectly
> through roc()) the false positive rate.
> Unfortunately, you can't always compute the individual thresholded confusion
> matrix elements (TP, FP, TN, FN) from these quantities. You can make a system
> of equations out of the existing thresholded metrics and the total count, but
> they become underdetermined when there are no true positives.
> Fortunately, the individual confusion matrix elements by threshold are
> already computed and sitting in the confusions variable. It would be helpful
> to expose these elements directly. The easiest way would probably be by
> adding methods like
> {code:java}
> def truePositivesByThreshold(): RDD[(Double, Double)] = confusions.map{ case
> (t, c) => (t, c.weightedTruePositives) }{code}
> An alternative could be to expose the entire RDD[(Double,
> BinaryConfusionMatrix)] in one method, but BinaryConfusionMatrix is also
> currently package private.
> The closest issue to this I found was this one for adding new calculations to
> BinaryClassificationMetrics
> https://issues.apache.org/jira/browse/SPARK-18844, which was closed without
> any changes being merged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]