[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639864#comment-13639864
 ] 

Carlo Curino commented on MAPREDUCE-5176:
-----------------------------------------


In this patch, we introduce an annotation used to express a property of user 
defined classes (such as Reducer and OutputCommitter). The annotation is 
@Preemptable, and the intended semantics is that the tagged class is safe to be 
preempted between invocations. The use of an annotation instead of interfaces 
allows us to avoid automatic (possibly involuntary) inheritance.

More concretely: 

# stateless operators:  a simple use case for Reducers is when the user defined 
function is a "pure" reducer, i.e., a Reducer that does not maintain state 
across key-groups (or if it does is for performance and it is not required for 
correctness). Note that the default class Reducer.java is indeed a "pure" 
reducer, hence it is tagged with @Preemptable, however a user supplied reducer 
must explicitly state this if it wants to be treated as preemptable. If the 
@Preemptable annotation is provided the system can automatically handle 
preemption, by saving the output produced so far  and subsequently restart the 
execution of this task from the next key group. (this will be posted in 
separate patches/jiras)

# statefull operators:  advanced users can also tag as @Preemptable non-pure 
reducers  (i.e., reducers that accumulate non-trivial state across key 
boundaries), however the default preemption mechanism we provide will not be 
sufficient, and the user will be required to override default 
checkpoint/restart logic, to include operator-specific state saving and 
retreival.  
 
# for OutputCommitter being @Preemptable means that the output committer can be 
used to commit partial output from a given task. In order to handle failure 
scenarios we also require the OutputCommitter to provide a 
cleanupPartialOutput(TaskAttemptId tid) method that can be invoked by the 
system to completely reset the execution for a given task.  The simple case we 
show in the patch is an extended version of FileOutputCommitter, in which we 
provide a simple mechanism to commit partial output for a task (by including 
the task_attempt_id in the file name), and an equivalent cleanup functionality.


Note that this is a first use of annotations to describe properties of 
user-provided classes, it is easy to imagine several other such use cases, 
e.g., @KeyPreserving, @OrderPreserving,  etc… which could be used to pipeline 
maps and reduces, or to leverage JVM reuse etc. 


This is part of umbrella JIRA MAPREDUCE-4584, and is related to the preemption 
protocol changes discussed in YARN-45, and supported in YARN-567, YARN-568, and 
YARN-569. 



                
> Preemptable annotations (to support preemption in MR)
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5176
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5176
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>
> Proposing a patch that introduces a new annotation @Preemptable that 
> represents to the framework property of user-supplied classes (e.g., Reducer, 
> OutputCommiter). The intended semantics is that a tagged class is safe to be 
> preempted between invocations. 
> (this is in spirit similar to the Output Contracts of [Nephele/PACT | 
> https://stratosphere.eu/sites/default/files/papers/ComparingMapReduceAndPACTs_11.pdf])

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to