I have had it fail with a single reducer and with 100 reducers.
Ultimately it needs to be funneled to a single reducer though.

-K

On Wed, Feb 16, 2011 at 9:02 AM, real great..
<[email protected]> wrote:
> Hi,
> How many reducers are you using currently?
> Try increasing the number or reducers.
> Let me know if it helps.
>
> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart 
> <[email protected]>wrote:
>
>> Hello, I'm seeing frequent fails in reduce jobs with errors similar to
>> this:
>>
>>
>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492,
>> decompressed len: 172488
>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner:
>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>> java.lang.OutOfMemoryError: Java heap space
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>>
>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>> Shuffling 172488 bytes (172492 raw bytes) into RAM from
>> attempt_201102081823_0175_m_002153_0
>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944,
>> decompressed len: 161940
>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365,
>> decompressed len: 228361
>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: Task
>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from
>> attempt_201102081823_0175_m_002153_0
>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner:
>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>> java.lang.OutOfMemoryError: Java heap space
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>>
>> Some also show this:
>>
>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>        at sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63)
>>        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811)
>>        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>>        at
>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>>
>> The particular job I'm running is an attempt to merge multiple time
>> series files into a single file.  The job tracker shows the following:
>>
>>
>> Kind    Num Tasks    Complete   Killed    Failed/Killed Task Attempts
>> map     15795        15795      0         0 / 29
>> reduce  100          30         70        17 / 29
>>
>> All of the files I'm reading have records with a timestamp key similar to:
>>
>> 2011-01-03 08:30:00.457000<tab><record>
>>
>> My map job is a simple python program that ignores rows with times <
>> 08:30:00 and > 15:00:00, determines the type of input row and writes
>> it to stdout with very minor modification.  It maintains no state and
>> should not use any significant memory.  My reducer is the
>> IdentityReducer.  The input files are individually gzipped then put
>> into hdfs.  The total uncompressed size of the output should be around
>> 150G.  Our cluster is 32 nodes each of which has 16G RAM and most of
>> which have two 2T drives.  We're running hadoop 0.20.2.
>>
>>
>> Can anyone provide some insight on how we can eliminate this issue?
>> I'm certain this email does not provide enough info, please let me
>> know what further information is needed to troubleshoot.
>>
>> Thanks in advance,
>>
>> -Kelly
>>
>
>
>
> --
> Regards,
> R.V.
>

Reply via email to