[ https://issues.apache.org/jira/browse/SPARK-31218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066474#comment-17066474 ]
CacheCheck commented on SPARK-31218: ------------------------------------ I mean rdd {{counts}} belong in the lazy val block below {{recallByThreshold}}. It is used by counts.count() when generating {{binnedCounts}}, and again by collect() when {{binnedCounts}} generates {{agg}}. So I think {{counts}} should be persisted, and it can be unpersisted after {{cumulativeCounts}} is persisted. > counts in BinaryClassificationMetrics should be cached > ------------------------------------------------------ > > Key: SPARK-31218 > URL: https://issues.apache.org/jira/browse/SPARK-31218 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Affects Versions: 2.4.4, 2.4.5 > Reporter: CacheCheck > Priority: Major > > In mllib.evaluation.BinaryClassifcationMetrics.recallByThreshold(), rdd > _counts_ should be cached for the following multiple actions will use it. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org