Re: MapReduce tasks cleanup

Mefa Grut Tue, 10 Jan 2012 13:05:46 -0800

Thanks! The thread is very helpful this is exactly what I see.
overriding Mapper.run is interesting and looks "cleaner" in terms of
software design.





BTW:

Should I expect cleanup to be killed when a task fail or killed(speculative
execution)?

I meant

Should I expect cleanup to be *called* when a task fail or
killed(speculative execution)?

and you did answer that.




On Tue, Jan 10, 2012 at 4:33 PM, Harsh J <ha...@cloudera.com> wrote:

> Mefa,
>
> On 10-Jan-2012, at 6:38 PM, Mefa Grut wrote:
>
> Two cleanup related questions:
> Can I execute context.write from the reduce/map cleanup phase?
>
>
> If by cleanup, you mean the mapper/reducer cleanup methods, then the
> answer is Yes, and this has been asked previously:
> http://search-hadoop.com/m/jzO0k18XoNW1 if you want to know some random
> info. on top.
>
> (You probably do not even seek the cleanup method, see my last para.)
>
> Should I expect cleanup to be killed when a task fail or
> killed(speculative execution)?
>
>
> I don't understand this question.
>
> If your task fails, then it fails right there. Your cleanup() method won't
> even be called, since your task would exit with whatever error it ran into.
> And kills (user-killed or speculative-killed) are pure kills, so your task
> may die out immediately when such a signal is issued.
>
> The idea is to update HBase counters from within mapreduce job (kind of
> alternative to the builtin mapreduce counters that can scale to millions of
> counters).
>
> Since tak can fail and run again or be duplicated and killed  events can
> be incremented too many times. How Hadoop workaround this problem with the
> generic counters?
>
>
> In Hadoop, the counters are added only from successful tasks (i.e. tasks
> that have been 'committed' by the framework, via the OutputCommitter).
>
> I think, for your case, it'd be better if you did the final committing
> with a custom impl. of OutputCommitter. But unfortunately the output stream
> is not available inside the FOC, so you'd have to probably hack around a
> bit to get your outputs to HBase in the end. But there may surely be other,
> possibly better solutions :)
>
> A good idea would be to also ask this specific issue on the HBase's user
> lists, so you reach the right audience.
>

Re: MapReduce tasks cleanup

Reply via email to