TinaLi created SPARK-32904: ------------------------------ Summary: pyspark.mllib.evaluation.MulticlassMetrics needs to swap the results of precision( ) and recall( ) Key: SPARK-32904 URL: https://issues.apache.org/jira/browse/SPARK-32904 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 3.0.1 Reporter: TinaLi
[https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/evaluation/MulticlassMetrics.html] *The values returned by the precision() and recall() methods of this API should be swapped.* Following is the example results I got when I run this API. It prints out precision metrics = MulticlassMetrics(predictionAndLabels) print (metrics.confusionMatrix().toArray()) print ("precision: ",metrics.precision(1)) print ("recall: ",metrics.recall(1)) [[36631. 2845.] [ 3839. 1610.]] precision: 0.3613916947250281 recall: 0.2954670581758121 predictions.select('prediction').agg(\{'prediction':'sum'}).show() |sum(prediction)| 5449.0| As you can see, my model predicted 5449 cases with label=1, and 1610 out of the 5449 cases are true positive, so precision should be 1610/5449=0.2954670581758121, but this API assigned the precision value to recall() method, which should be swapped. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org