Hello, On Fri, Nov 18, 2011 at 10:44 AM, Something Something <mailinglist...@gmail.com> wrote: > Thanks for the reply. Here's another concern we have. Let's say Mapper has > finished processing 1000 lines from the input file & then the machine goes > down. I believe Hadoop is smart enough to re-distribute the input split > that was assigned to this Mapper, correct? After re-assigning will it > reprocess the 1000 lines that were processed successfully before & start > from line 1001 OR would it reprocess ALL lines?
Attempts of any task start afresh. That's the default nature of Hadoop. So, it would begin from start again and hence reprocess ALL lines. Understand that cleanup is just a fancy API call here, thats called after the input reader completes - not a "stage". -- Harsh J