On Nov 18, 2011, at 10:44 AM, Harsh J wrote: > > If you could follow up on that patch, and see it through, its wish granted > for a lot of us as well, as we move ahead with the newer APIs in the future > Hadoop releases ;-) >
The plan is to support both mapred and mapreduce MR apis for the forseeable future. Arun > On 18-Nov-2011, at 10:32 PM, Something Something wrote: > >> Thanks again. Will look at Mapper.run to understand better. Actually, just >> a few minutes ago I got the AVROMapper to work (which will read from AVRO >> files). This will hopefully improve performance even more. >> >> Interesting, AVROMapper doesn't extend from Mapper, so it doesn't have the >> 'cleanup' method. Instead it provides a 'close' method, which seems to >> behave the same way. Honestly, I like the method name 'close' better than >> 'cleanup'. >> >> Doug - Is there a reason you chose to not extend from >> org/apache/hadoop/mapreduce/Mapper? >> >> Thank you all for your help. >> >> >> On Fri, Nov 18, 2011 at 7:44 AM, Harsh J <ha...@cloudera.com> wrote: >> Given that you are sure about it, and you also know why thats the >> case, I'd definitely write inside the cleanup(…) hook. No harm at all >> in doing that. >> >> Take a look at mapreduce.Mapper#run(…) method in source and you'll >> understand what I mean by it not being a stage or even an event, but >> just a tail call after all map()s are called. >> >> On Fri, Nov 18, 2011 at 8:58 PM, Something Something >> <mailinglist...@gmail.com> wrote: >> > Thanks again for the clarification. Not sure what you mean by it's not a >> > 'stage'! Okay.. may be not a stage but I think of it as an 'Event', such >> > as >> > 'Mouseover', 'Mouseout'. The 'cleanup' is really a 'MapperCompleted' >> > event, >> > right? >> > >> > Confusion comes with the name of this method. The name 'cleanup' makes me >> > think it should not be really used as 'mapperCompleted', but it appears >> > there's no harm in using it that way. >> > >> > Here's our dilemma - when we use (local) caching in the Mapper & write in >> > the 'cleanup', our job completes in 18 minutes. When we don't write in >> > 'cleanup' it takes 3 hours!!! Knowing this if you were to decide, would >> > you >> > use 'cleanup' for this purpose? >> > >> > Thanks once again for your advice. >> > >> > >> > On Thu, Nov 17, 2011 at 9:35 PM, Harsh J <ha...@cloudera.com> wrote: >> >> >> >> Hello, >> >> >> >> On Fri, Nov 18, 2011 at 10:44 AM, Something Something >> >> <mailinglist...@gmail.com> wrote: >> >> > Thanks for the reply. Here's another concern we have. Let's say Mapper >> >> > has >> >> > finished processing 1000 lines from the input file & then the machine >> >> > goes >> >> > down. I believe Hadoop is smart enough to re-distribute the input split >> >> > that was assigned to this Mapper, correct? After re-assigning will it >> >> > reprocess the 1000 lines that were processed successfully before & start >> >> > from line 1001 OR would it reprocess ALL lines? >> >> >> >> Attempts of any task start afresh. That's the default nature of Hadoop. >> >> >> >> So, it would begin from start again and hence reprocess ALL lines. >> >> Understand that cleanup is just a fancy API call here, thats called >> >> after the input reader completes - not a "stage". >> >> >> >> -- >> >> Harsh J >> > >> > >> >> >> >> -- >> Harsh J >> >