Thanks! The thread is very helpful this is exactly what I see. overriding Mapper.run is interesting and looks "cleaner" in terms of software design.
BTW: Should I expect cleanup to be killed when a task fail or killed(speculative execution)? I meant Should I expect cleanup to be *called* when a task fail or killed(speculative execution)? and you did answer that. On Tue, Jan 10, 2012 at 4:33 PM, Harsh J <ha...@cloudera.com> wrote: > Mefa, > > On 10-Jan-2012, at 6:38 PM, Mefa Grut wrote: > > Two cleanup related questions: > Can I execute context.write from the reduce/map cleanup phase? > > > If by cleanup, you mean the mapper/reducer cleanup methods, then the > answer is Yes, and this has been asked previously: > http://search-hadoop.com/m/jzO0k18XoNW1 if you want to know some random > info. on top. > > (You probably do not even seek the cleanup method, see my last para.) > > Should I expect cleanup to be killed when a task fail or > killed(speculative execution)? > > > I don't understand this question. > > If your task fails, then it fails right there. Your cleanup() method won't > even be called, since your task would exit with whatever error it ran into. > And kills (user-killed or speculative-killed) are pure kills, so your task > may die out immediately when such a signal is issued. > > The idea is to update HBase counters from within mapreduce job (kind of > alternative to the builtin mapreduce counters that can scale to millions of > counters). > > Since tak can fail and run again or be duplicated and killed events can > be incremented too many times. How Hadoop workaround this problem with the > generic counters? > > > In Hadoop, the counters are added only from successful tasks (i.e. tasks > that have been 'committed' by the framework, via the OutputCommitter). > > I think, for your case, it'd be better if you did the final committing > with a custom impl. of OutputCommitter. But unfortunately the output stream > is not available inside the FOC, so you'd have to probably hack around a > bit to get your outputs to HBase in the end. But there may surely be other, > possibly better solutions :) > > A good idea would be to also ask this specific issue on the HBase's user > lists, so you reach the right audience. >