CacheCheck created SPARK-31217:
----------------------------------
Summary: Unnecessary persist on cumulativeCounts in
BinaryClassificationMetrics
Key: SPARK-31217
URL: https://issues.apache.org/jira/browse/SPARK-31217
Project: Spark
Issue Type: Improvement
Components: ML, MLlib
Affects Versions: 2.4.5, 2.4.4
Reporter: CacheCheck
In mllib.evaluation.BinaryClassificationMetrics, _cumulativeCounts_ is cached
in a lazy initialization. But when I run LogisticRegressionSummaryExample as
well as ModelSelectionViaCrossValidationExample, I find that cached
_cumulativeCounts_ only used by one action during execution.
So I think it should not be cached in initilization, we can set an extra
persist() API in this class, just as that the unpersist() API in
BinaryClassificationMetrics releases cached _cumulativeCounts_.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]