András Barják created SPARK-2418:
------------------------------------
Summary: Custom checkpointing with an external function as
parameter
Key: SPARK-2418
URL: https://issues.apache.org/jira/browse/SPARK-2418
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 1.0.0
Reporter: András Barják
If a job consists of many shuffle heavy transformations the current resilience
model might be unsatisfactory. In our current use-case we need a persistent
checkpoint that we can use to save our RDDs on disk in a custom location and
load it back even if the driver dies. (Possible other use cases: store the
checkpointed data in various formats: SequenceFile, csv, Parquet file, MySQL
etc.)
After talking to [~pwendell] at the Spark Summit 2014 we concluded that a
checkpoint where one can customize the saving and RDD reloading behavior can be
a good solution. I am open to further suggestions if you have better ideas
about how to make checkpointing more flexible.
--
This message was sent by Atlassian JIRA
(v6.2#6252)