[
https://issues.apache.org/jira/browse/MAPREDUCE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645851#comment-13645851
]
Carlo Curino commented on MAPREDUCE-5197:
-----------------------------------------
The key idea here is to provide the user a simple API to write out the "state"
of its computation to a safe place, where it could be retrieved later on to
restart the computation. The main use we have for this at the moment is in the
context of the preemption work in YARN-45, YARN-567, YARN-568, YARN-569,
MAPREDUCE-5196, MAPREDUCE-5197, and MAPREDUCE-5189.
However checkpointing has many other uses which we can consider in the future,
e.g., sub-task granularity for recovery, and many dynamic optimizations (we
played around with suspending reducers voluntarily if all pending maps are
struggling).
The main design point of the API is to keep the location of the checkpoint
secret until the user is done writing to it, and allow to open existing
checkpoint only for read. This tradeoff appending state to an existing
checkpoint for simplicity in enforcing atomicity, write-once semantics. In
fact, this two strong properties are almost free given the chosen API. This
will benefit alternative implementations of the checkpoint API (you can
envision main-memory versions of this if checkpointing is used only to
"transfer" a computation instead of long-term suspension or failure-tollerance,
we had a map-servlet based version as well, etc...).
> Checkpoint Service: a library component to facilitate checkpoint of task state
> ------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5197
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5197
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mrv2
> Reporter: Carlo Curino
> Assignee: Carlo Curino
>
> A small library that abstract file API for the purpose of checkpointing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira