[
https://issues.apache.org/jira/browse/SPARK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160072#comment-14160072
]
András Barják commented on SPARK-2418:
--------------------------------------
Hi, I would be happy if someone could comment on my pull request. I don't mind
if it gets rejected, but I would like to know the reason why so I can possibly
come up with a better solution that fits to the official Spark core vision!
We really need this feature at our company to be able to custom save and
checkpoint the rdds without the need of reloading them.
I am not sure I understand how this would be related to the pluggable
interfaces. Please, explain me how you imagine solving this issue!
> Custom checkpointing with an external function as parameter
> -----------------------------------------------------------
>
> Key: SPARK-2418
> URL: https://issues.apache.org/jira/browse/SPARK-2418
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.0.0
> Reporter: András Barják
>
> If a job consists of many shuffle heavy transformations the current
> resilience model might be unsatisfactory. In our current use-case we need a
> persistent checkpoint that we can use to save our RDDs on disk in a custom
> location and load it back even if the driver dies. (Possible other use cases:
> store the checkpointed data in various formats: SequenceFile, csv, Parquet
> file, MySQL etc.)
> After talking to [~pwendell] at the Spark Summit 2014 we concluded that a
> checkpoint where one can customize the saving and RDD reloading behavior can
> be a good solution. I am open to further suggestions if you have better ideas
> about how to make checkpointing more flexible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]