Matei Zaharia created SPARK-2014:
------------------------------------
Summary: Make PySpark store RDDs in MEMORY_ONLY_SER with
compression by default
Key: SPARK-2014
URL: https://issues.apache.org/jira/browse/SPARK-2014
Project: Spark
Issue Type: Improvement
Components: PySpark
Reporter: Matei Zaharia
Since the data is serialized on the Python side, there's not much point in
keeping it as byte arrays in Java, or even in skipping compression. We should
make cache() in PySpark use MEMORY_ONLY_SER and turn on spark.rdd.compress for
it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)