[
https://issues.apache.org/jira/browse/SPARK-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron Staple resolved SPARK-3550.
---------------------------------
Resolution: Fixed
> Disable automatic rdd caching in python api for relevant learners
> -----------------------------------------------------------------
>
> Key: SPARK-3550
> URL: https://issues.apache.org/jira/browse/SPARK-3550
> Project: Spark
> Issue Type: Improvement
> Components: MLlib, PySpark
> Reporter: Aaron Staple
>
> The python mllib api automatically caches training rdds. However, the
> NaiveBayes, ALS, and DecisionTree learners do not require external caching to
> prevent repeated RDD re-evaluation during learning. NaiveBayes only evaluates
> its input RDD once, while ALS and DecisionTree internally persist
> transformations of their input RDDs. For these learners, we should disable
> the automatic caching in the python mllib api.
> See discussion here:
> https://github.com/apache/spark/pull/2362#issuecomment-55637953
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]