Matei Zaharia created SPARK-2014:
------------------------------------

             Summary: Make PySpark store RDDs in MEMORY_ONLY_SER with 
compression by default
                 Key: SPARK-2014
                 URL: https://issues.apache.org/jira/browse/SPARK-2014
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
            Reporter: Matei Zaharia


Since the data is serialized on the Python side, there's not much point in 
keeping it as byte arrays in Java, or even in skipping compression. We should 
make cache() in PySpark use MEMORY_ONLY_SER and turn on spark.rdd.compress for 
it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to