[
https://issues.apache.org/jira/browse/SPARK-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron Staple resolved SPARK-3488.
---------------------------------
Resolution: Won't Fix
> cache deserialized python RDDs before iterative learning
> --------------------------------------------------------
>
> Key: SPARK-3488
> URL: https://issues.apache.org/jira/browse/SPARK-3488
> Project: Spark
> Issue Type: Improvement
> Components: MLlib, PySpark
> Reporter: Aaron Staple
>
> When running an iterative learning algorithm, it makes sense that the input
> RDD be cached for improved performance. When learning is applied to a python
> RDD, currently the python RDD is always cached, then in scala that cached RDD
> is mapped to an uncached deserialized RDD, and the uncached RDD is passed to
> the learning algorithm. Instead the deserialized RDD should be cached.
> This was originally discussed here:
> https://github.com/apache/spark/pull/2347#issuecomment-55181535
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]