Re: Business logic in cleanup?

Harsh J Fri, 18 Nov 2011 10:45:10 -0800

I believe its been discussed on Avro user lists before, but here's what you 
want: https://issues.apache.org/jira/browse/AVRO-593


If you could follow up on that patch, and see it through, its wish granted for 
a lot of us as well, as we move ahead with the newer APIs in the future Hadoop 
releases ;-)

On 18-Nov-2011, at 10:32 PM, Something Something wrote:

> Thanks again.  Will look at Mapper.run to understand better.  Actually, just 
> a few minutes ago I got the AVROMapper to work (which will read from AVRO 
> files). This will hopefully improve performance even more.
> 
> Interesting, AVROMapper doesn't extend from Mapper, so it doesn't have the 
> 'cleanup' method.  Instead it provides a 'close' method, which seems to 
> behave the same way.  Honestly, I like the method name 'close' better than 
> 'cleanup'.
> 
> Doug - Is there a reason you chose to not extend from 
> org/apache/hadoop/mapreduce/Mapper?
> 
> Thank you all for your help.
> 
> 
> On Fri, Nov 18, 2011 at 7:44 AM, Harsh J <ha...@cloudera.com> wrote:
> Given that you are sure about it, and you also know why thats the
> case, I'd definitely write inside the cleanup(…) hook. No harm at all
> in doing that.
> 
> Take a look at mapreduce.Mapper#run(…) method in source and you'll
> understand what I mean by it not being a stage or even an event, but
> just a tail call after all map()s are called.
> 
> On Fri, Nov 18, 2011 at 8:58 PM, Something Something
> <mailinglist...@gmail.com> wrote:
> > Thanks again for the clarification.  Not sure what you mean by it's not a
> > 'stage'!  Okay.. may be not a stage but I think of it as an 'Event', such as
> > 'Mouseover', 'Mouseout'.  The 'cleanup' is really a 'MapperCompleted' event,
> > right?
> >
> > Confusion comes with the name of this method.  The name 'cleanup' makes me
> > think it should not be really used as 'mapperCompleted', but it appears
> > there's no harm in using it that way.
> >
> > Here's our dilemma - when we use (local) caching in the Mapper & write in
> > the 'cleanup', our job completes in 18 minutes.  When we don't write in
> > 'cleanup' it takes 3 hours!!!  Knowing this if you were to decide, would you
> > use 'cleanup' for this purpose?
> >
> > Thanks once again for your advice.
> >
> >
> > On Thu, Nov 17, 2011 at 9:35 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hello,
> >>
> >> On Fri, Nov 18, 2011 at 10:44 AM, Something Something
> >> <mailinglist...@gmail.com> wrote:
> >> > Thanks for the reply.  Here's another concern we have.  Let's say Mapper
> >> > has
> >> > finished processing 1000 lines from the input file & then the machine
> >> > goes
> >> > down.  I believe Hadoop is smart enough to re-distribute the input split
> >> > that was assigned to this Mapper, correct?  After re-assigning will it
> >> > reprocess the 1000 lines that were processed successfully before & start
> >> > from line 1001  OR  would it reprocess ALL lines?
> >>
> >> Attempts of any task start afresh. That's the default nature of Hadoop.
> >>
> >> So, it would begin from start again and hence reprocess ALL lines.
> >> Understand that cleanup is just a fancy API call here, thats called
> >> after the input reader completes - not a "stage".
> >>
> >> --
> >> Harsh J
> >
> >
> 
> 
> 
> --
> Harsh J
>

Re: Business logic in cleanup?

Reply via email to