I’m looking at the docs here: http://spark.apache.org/docs/1.6.2/api/python/pyspark.html#pyspark.StorageLevel <http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.StorageLevel>
A newcomer to Spark won’t understand the meaning of _2, or the meaning of _SER (or its value), and won’t understand how exactly memory and disk play together when something like MEMORY_AND_DISK is selected. Is there a place in the docs that expands on the storage levels a bit? If not, shall we create a JIRA and expand this documentation? I don’t mind taking on this task, though frankly I’m interested in this because I don’t fully understand the differences myself. :) Nick