Thanks David,
Unfortunately, that's only called when a task finishes consuming input
successfully. My issue deals with tasks that are killed (job is
killed or task is pre-empted by the scheduler). I tried overriding
run() as follows:
public void run(Context context) {
try{
super.run();
finally{
// my very important cleanup stuff that should very rarely get missed.
}
}
If the task fails by throwing an unhandled exception, then cleanup()
is skipped (the default run doesn't call it), but the finally code
still works. However, if the task tracker kills the task, then the
finally block is skipped. I'm reluctant to put in signal handling
code to catch the TERM signal, as that's not Java standard.
Thanks Again,
Chase
On Mon, Sep 13, 2010 at 11:28 AM, David Rosenstrauch <[email protected]> wrote:
> On 09/12/2010 02:38 AM, Chase Bradford wrote:
>>
>> I have a mapper class (extended from mapreduce.Mapper), where setup
>> reports to an outside resource. I want to make sure that most of the
>> time when the task fails or is killed, a specific chunk of cleanup
>> code is executed.
>
> Just like there's a "setup(Mapper.Context context)" method, there's also a
> "cleanup(Mapper.Context context)" method for just this purpose.
>
> See:
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup(org.apache.hadoop.mapreduce.Mapper.Context)
>
> DR
>
--
Chase Bradford
“If in physics there's something you don't understand, you can always
hide behind the uncharted depths of nature. But if your program
doesn't work, there is no obstinate nature. If it doesn't work, you've
messed up.”
- Edsger Dijkstra