Thanks again for the clarification.  Not sure what you mean by it's not a
'stage'!  Okay.. may be not a stage but I think of it as an 'Event', such
as 'Mouseover', 'Mouseout'.  The 'cleanup' is really a 'MapperCompleted'
event, right?

Confusion comes with the name of this method.  The name 'cleanup' makes me
think it should not be really used as 'mapperCompleted', but it appears
there's no harm in using it that way.

Here's our dilemma - when we use (local) caching in the Mapper & write in
the 'cleanup', our job completes in 18 minutes.  When we don't write in
'cleanup' it takes 3 hours!!!  Knowing this if you were to decide, would
you use 'cleanup' for this purpose?

Thanks once again for your advice.


On Thu, Nov 17, 2011 at 9:35 PM, Harsh J <ha...@cloudera.com> wrote:

> Hello,
>
> On Fri, Nov 18, 2011 at 10:44 AM, Something Something
> <mailinglist...@gmail.com> wrote:
> > Thanks for the reply.  Here's another concern we have.  Let's say Mapper
> has
> > finished processing 1000 lines from the input file & then the machine
> goes
> > down.  I believe Hadoop is smart enough to re-distribute the input split
> > that was assigned to this Mapper, correct?  After re-assigning will it
> > reprocess the 1000 lines that were processed successfully before & start
> > from line 1001  OR  would it reprocess ALL lines?
>
> Attempts of any task start afresh. That's the default nature of Hadoop.
>
> So, it would begin from start again and hence reprocess ALL lines.
> Understand that cleanup is just a fancy API call here, thats called
> after the input reader completes - not a "stage".
>
> --
> Harsh J
>

Reply via email to