Given that you are sure about it, and you also know why thats the case, I'd definitely write inside the cleanup(…) hook. No harm at all in doing that.
Take a look at mapreduce.Mapper#run(…) method in source and you'll understand what I mean by it not being a stage or even an event, but just a tail call after all map()s are called. On Fri, Nov 18, 2011 at 8:58 PM, Something Something <mailinglist...@gmail.com> wrote: > Thanks again for the clarification. Not sure what you mean by it's not a > 'stage'! Okay.. may be not a stage but I think of it as an 'Event', such as > 'Mouseover', 'Mouseout'. The 'cleanup' is really a 'MapperCompleted' event, > right? > > Confusion comes with the name of this method. The name 'cleanup' makes me > think it should not be really used as 'mapperCompleted', but it appears > there's no harm in using it that way. > > Here's our dilemma - when we use (local) caching in the Mapper & write in > the 'cleanup', our job completes in 18 minutes. When we don't write in > 'cleanup' it takes 3 hours!!! Knowing this if you were to decide, would you > use 'cleanup' for this purpose? > > Thanks once again for your advice. > > > On Thu, Nov 17, 2011 at 9:35 PM, Harsh J <ha...@cloudera.com> wrote: >> >> Hello, >> >> On Fri, Nov 18, 2011 at 10:44 AM, Something Something >> <mailinglist...@gmail.com> wrote: >> > Thanks for the reply. Here's another concern we have. Let's say Mapper >> > has >> > finished processing 1000 lines from the input file & then the machine >> > goes >> > down. I believe Hadoop is smart enough to re-distribute the input split >> > that was assigned to this Mapper, correct? After re-assigning will it >> > reprocess the 1000 lines that were processed successfully before & start >> > from line 1001 OR would it reprocess ALL lines? >> >> Attempts of any task start afresh. That's the default nature of Hadoop. >> >> So, it would begin from start again and hence reprocess ALL lines. >> Understand that cleanup is just a fancy API call here, thats called >> after the input reader completes - not a "stage". >> >> -- >> Harsh J > > -- Harsh J