Given that you are sure about it, and you also know why thats the
case, I'd definitely write inside the cleanup(…) hook. No harm at all
in doing that.

Take a look at mapreduce.Mapper#run(…) method in source and you'll
understand what I mean by it not being a stage or even an event, but
just a tail call after all map()s are called.

On Fri, Nov 18, 2011 at 8:58 PM, Something Something
<mailinglist...@gmail.com> wrote:
> Thanks again for the clarification.  Not sure what you mean by it's not a
> 'stage'!  Okay.. may be not a stage but I think of it as an 'Event', such as
> 'Mouseover', 'Mouseout'.  The 'cleanup' is really a 'MapperCompleted' event,
> right?
>
> Confusion comes with the name of this method.  The name 'cleanup' makes me
> think it should not be really used as 'mapperCompleted', but it appears
> there's no harm in using it that way.
>
> Here's our dilemma - when we use (local) caching in the Mapper & write in
> the 'cleanup', our job completes in 18 minutes.  When we don't write in
> 'cleanup' it takes 3 hours!!!  Knowing this if you were to decide, would you
> use 'cleanup' for this purpose?
>
> Thanks once again for your advice.
>
>
> On Thu, Nov 17, 2011 at 9:35 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hello,
>>
>> On Fri, Nov 18, 2011 at 10:44 AM, Something Something
>> <mailinglist...@gmail.com> wrote:
>> > Thanks for the reply.  Here's another concern we have.  Let's say Mapper
>> > has
>> > finished processing 1000 lines from the input file & then the machine
>> > goes
>> > down.  I believe Hadoop is smart enough to re-distribute the input split
>> > that was assigned to this Mapper, correct?  After re-assigning will it
>> > reprocess the 1000 lines that were processed successfully before & start
>> > from line 1001  OR  would it reprocess ALL lines?
>>
>> Attempts of any task start afresh. That's the default nature of Hadoop.
>>
>> So, it would begin from start again and hence reprocess ALL lines.
>> Understand that cleanup is just a fancy API call here, thats called
>> after the input reader completes - not a "stage".
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Reply via email to