Our clust admin (who's out of town today) has mapred.child.java.opts
set to -Xmx1280 in mapred-site.xml.  However, if I go to the job
configuration page for a job I'm running right now, it claims this
option is set to -Xmx200m.  There are other settings in
mapred-site.xml that are different too.  Why would map/reduce jobs not
respect the mapred-site.xml file?

-K

On Wed, Feb 16, 2011 at 9:43 AM, Jim Falgout <[email protected]> wrote:
> You can set the amount of memory used by the reducer using the 
> mapreduce.reduce.java.opts property. Set it in mapred-site.xml or override it 
> in your job. You can set it to something like: -Xm512M to increase the amount 
> of memory used by the JVM spawned for the reducer task.
>
> -----Original Message-----
> From: Kelly Burkhart [mailto:[email protected]]
> Sent: Wednesday, February 16, 2011 9:12 AM
> To: [email protected]
> Subject: Re: Reduce java.lang.OutOfMemoryError
>
> I have had it fail with a single reducer and with 100 reducers.
> Ultimately it needs to be funneled to a single reducer though.
>
> -K
>
> On Wed, Feb 16, 2011 at 9:02 AM, real great..
> <[email protected]> wrote:
>> Hi,
>> How many reducers are you using currently?
>> Try increasing the number or reducers.
>> Let me know if it helps.
>>
>> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart 
>> <[email protected]>wrote:
>>
>>> Hello, I'm seeing frequent fails in reduce jobs with errors similar
>>> to
>>> this:
>>>
>>>
>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492,
>>> decompressed len: 172488
>>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner:
>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>>> java.lang.OutOfMemoryError: Java heap space
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf
>>> fleInMemory(ReduceTask.java:1508)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>> apOutput(ReduceTask.java:1408)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>> Output(ReduceTask.java:1261)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>> ReduceTask.java:1195)
>>>
>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>> Shuffling 172488 bytes (172492 raw bytes) into RAM from
>>> attempt_201102081823_0175_m_002153_0
>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944,
>>> decompressed len: 161940
>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365,
>>> decompressed len: 228361
>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>> Task
>>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from
>>> attempt_201102081823_0175_m_002153_0
>>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner:
>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>>> java.lang.OutOfMemoryError: Java heap space
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf
>>> fleInMemory(ReduceTask.java:1508)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>> apOutput(ReduceTask.java:1408)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>> Output(ReduceTask.java:1261)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>> ReduceTask.java:1195)
>>>
>>> Some also show this:
>>>
>>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>        at
>>> sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63)
>>>        at
>>> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811)
>>>        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>>>        at
>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
>>> nection.java:1072)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getI
>>> nputStream(ReduceTask.java:1447)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>> apOutput(ReduceTask.java:1349)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>> Output(ReduceTask.java:1261)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>> ReduceTask.java:1195)
>>>
>>> The particular job I'm running is an attempt to merge multiple time
>>> series files into a single file.  The job tracker shows the following:
>>>
>>>
>>> Kind    Num Tasks    Complete   Killed    Failed/Killed Task Attempts
>>> map     15795        15795      0         0 / 29 reduce  100
>>> 30         70        17 / 29
>>>
>>> All of the files I'm reading have records with a timestamp key similar to:
>>>
>>> 2011-01-03 08:30:00.457000<tab><record>
>>>
>>> My map job is a simple python program that ignores rows with times <
>>> 08:30:00 and > 15:00:00, determines the type of input row and writes
>>> it to stdout with very minor modification.  It maintains no state and
>>> should not use any significant memory.  My reducer is the
>>> IdentityReducer.  The input files are individually gzipped then put
>>> into hdfs.  The total uncompressed size of the output should be
>>> around 150G.  Our cluster is 32 nodes each of which has 16G RAM and
>>> most of which have two 2T drives.  We're running hadoop 0.20.2.
>>>
>>>
>>> Can anyone provide some insight on how we can eliminate this issue?
>>> I'm certain this email does not provide enough info, please let me
>>> know what further information is needed to troubleshoot.
>>>
>>> Thanks in advance,
>>>
>>> -Kelly
>>>
>>
>>
>>
>> --
>> Regards,
>> R.V.
>>
>
>
>

Reply via email to