András Barják created SPARK-2418:
------------------------------------

             Summary: Custom checkpointing with an external function as 
parameter
                 Key: SPARK-2418
                 URL: https://issues.apache.org/jira/browse/SPARK-2418
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.0.0
            Reporter: András Barják


If a job consists of many shuffle heavy transformations the current resilience 
model might be unsatisfactory. In our current use-case we need a persistent 
checkpoint that we can use to save our RDDs on disk in a custom location and 
load it back even if the driver dies. (Possible other use cases: store the 
checkpointed data in various formats: SequenceFile, csv, Parquet file, MySQL 
etc.)
After talking to [~pwendell] at the Spark Summit 2014 we concluded that a 
checkpoint where one can customize the saving and RDD reloading behavior can be 
a good solution. I am open to further suggestions if you have better ideas 
about how to make checkpointing more flexible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to