CacheCheck created SPARK-31217:
----------------------------------

             Summary: Unnecessary persist on cumulativeCounts in 
BinaryClassificationMetrics
                 Key: SPARK-31217
                 URL: https://issues.apache.org/jira/browse/SPARK-31217
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
    Affects Versions: 2.4.5, 2.4.4
            Reporter: CacheCheck


In mllib.evaluation.BinaryClassificationMetrics, _cumulativeCounts_ is cached 
in a lazy initialization. But when I run LogisticRegressionSummaryExample as 
well as ModelSelectionViaCrossValidationExample, I find that cached 
_cumulativeCounts_ only used by one action during execution. 
So I think it should not be cached in initilization, we can set an extra 
persist() API in this class, just as that the unpersist() API in 
BinaryClassificationMetrics releases cached _cumulativeCounts_. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to