[
https://issues.apache.org/jira/browse/MAPREDUCE-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639864#comment-13639864
]
Carlo Curino commented on MAPREDUCE-5176:
-----------------------------------------
In this patch, we introduce an annotation used to express a property of user
defined classes (such as Reducer and OutputCommitter). The annotation is
@Preemptable, and the intended semantics is that the tagged class is safe to be
preempted between invocations. The use of an annotation instead of interfaces
allows us to avoid automatic (possibly involuntary) inheritance.
More concretely:
# stateless operators: a simple use case for Reducers is when the user defined
function is a "pure" reducer, i.e., a Reducer that does not maintain state
across key-groups (or if it does is for performance and it is not required for
correctness). Note that the default class Reducer.java is indeed a "pure"
reducer, hence it is tagged with @Preemptable, however a user supplied reducer
must explicitly state this if it wants to be treated as preemptable. If the
@Preemptable annotation is provided the system can automatically handle
preemption, by saving the output produced so far and subsequently restart the
execution of this task from the next key group. (this will be posted in
separate patches/jiras)
# statefull operators: advanced users can also tag as @Preemptable non-pure
reducers (i.e., reducers that accumulate non-trivial state across key
boundaries), however the default preemption mechanism we provide will not be
sufficient, and the user will be required to override default
checkpoint/restart logic, to include operator-specific state saving and
retreival.
# for OutputCommitter being @Preemptable means that the output committer can be
used to commit partial output from a given task. In order to handle failure
scenarios we also require the OutputCommitter to provide a
cleanupPartialOutput(TaskAttemptId tid) method that can be invoked by the
system to completely reset the execution for a given task. The simple case we
show in the patch is an extended version of FileOutputCommitter, in which we
provide a simple mechanism to commit partial output for a task (by including
the task_attempt_id in the file name), and an equivalent cleanup functionality.
Note that this is a first use of annotations to describe properties of
user-provided classes, it is easy to imagine several other such use cases,
e.g., @KeyPreserving, @OrderPreserving, etc… which could be used to pipeline
maps and reduces, or to leverage JVM reuse etc.
This is part of umbrella JIRA MAPREDUCE-4584, and is related to the preemption
protocol changes discussed in YARN-45, and supported in YARN-567, YARN-568, and
YARN-569.
> Preemptable annotations (to support preemption in MR)
> -----------------------------------------------------
>
> Key: MAPREDUCE-5176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5176
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mrv2
> Reporter: Carlo Curino
> Assignee: Carlo Curino
>
> Proposing a patch that introduces a new annotation @Preemptable that
> represents to the framework property of user-supplied classes (e.g., Reducer,
> OutputCommiter). The intended semantics is that a tagged class is safe to be
> preempted between invocations.
> (this is in spirit similar to the Output Contracts of [Nephele/PACT |
> https://stratosphere.eu/sites/default/files/papers/ComparingMapReduceAndPACTs_11.pdf])
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira