> The question is: the intermediary (before any reducer) results of completed > individual tasks are recorded in the HDFS, right? So why are these results > discarded, since the lost of the tasktracker is not the lost of already > processed data?
Intermediate results are stored on the local disks and served up via an embedded jetty HTTP server. If the tasktracker goes down, so does the embedded HTTP server. -Joey On Thu, Sep 29, 2011 at 12:59 PM, Leonardo Gamas <[email protected]> wrote: > No, the reducers are fine, or at least i didn't observe any problem. > > The question is: the intermediary (before any reducer) results of completed > individual tasks are recorded in the HDFS, right? So why are these results > discarded, since the lost of the tasktracker is not the lost of already > processed data? > > --Leonardo Gamas > > 2011/9/29 Robert Evans <[email protected]> >> >> If a TaskTracker is lost then it cannot serve up any Map results to >> Reducers that will need them so the Map tasks have to be rerun. I am not >> sure if this is the behavior you are seeing or not. Are completed Reducers >> being rerun as well? >> >> --Bobby Evans >> >> On 9/29/11 11:15 AM, "Leonardo Gamas" <[email protected]> wrote: >> >> Hi, >> >> I have a very large MapReduce Job and sometimes a TaskTracker doesn't send >> a heartbeat in the preconfigured amount of time, so it's considered dead. >> It's ok, but all tasks already finished by this TaskTracker are lost too, or >> better explained, are rescheduled and re-executed by another TaskTracker. >> >> This is a default behavior or i'm experiencing some bug or miss >> configuration? >> >> My reguards, >> >> Leonardo Gamas >> >> > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
