> The question is: the intermediary (before any reducer) results of completed
> individual tasks are recorded in the HDFS, right? So why are these results
> discarded, since the lost of the tasktracker is not the lost of already
> processed data?

Intermediate results are stored on the local disks and served up via
an embedded jetty HTTP server. If the tasktracker goes down, so does
the embedded HTTP server.

-Joey

On Thu, Sep 29, 2011 at 12:59 PM, Leonardo Gamas
<[email protected]> wrote:
> No, the reducers are fine, or at least i didn't observe any problem.
>
> The question is: the intermediary (before any reducer) results of completed
> individual tasks are recorded in the HDFS, right? So why are these results
> discarded, since the lost of the tasktracker is not the lost of already
> processed data?
>
> --Leonardo Gamas
>
> 2011/9/29 Robert Evans <[email protected]>
>>
>> If a TaskTracker is lost then it cannot serve up any Map results to
>> Reducers that will need them so the Map tasks have to be rerun.  I am not
>> sure if this is the behavior you are seeing or not.  Are completed Reducers
>> being rerun as well?
>>
>> --Bobby Evans
>>
>> On 9/29/11 11:15 AM, "Leonardo Gamas" <[email protected]> wrote:
>>
>> Hi,
>>
>> I have a very large MapReduce Job and sometimes a TaskTracker doesn't send
>> a heartbeat in the preconfigured amount of time, so it's considered dead.
>> It's ok, but all tasks already finished by this TaskTracker are lost too, or
>> better explained, are rescheduled and re-executed by another TaskTracker.
>>
>> This is a default behavior or i'm experiencing some bug or miss
>> configuration?
>>
>> My reguards,
>>
>> Leonardo Gamas
>>
>>
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Reply via email to