Good luck.

Let me know how it goes.

James

Sent from my mobile. Please excuse the typos.

On 2011-02-16, at 11:11 AM, Kelly Burkhart <[email protected]> wrote:

> OK, the job was preferring the config file on my local machine which
> is not part of the cluster over the cluster config files.  That seems
> completely broken to me; my config was basically empty other than
> containing the location of the cluster and my job apparently used
> defaults rather than the cluster config.  It doesn't make sense to me
> to keep configuration files synchronized on every machine that may
> access the cluster.
>
> I'm running again; we'll see if it completes this time.
>
> -K
>
> On Wed, Feb 16, 2011 at 10:30 AM, James Seigel <[email protected]> wrote:
>> Hrmmm. Well as you've pointed out. 200m is quite small and is probably
>> the cause.
>>
>> Now thEre might be some overriding settings in something you are using
>> to launch or something.
>>
>> You could set those values in the config to not be overridden in the
>> main conf then see what tries to override it in the logs
>>
>> Cheers
>> James
>>
>> Sent from my mobile. Please excuse the typos.
>>
>> On 2011-02-16, at 9:21 AM, Kelly Burkhart <[email protected]> wrote:
>>
>>> I should have mentioned this in my last email: I thought of that so I
>>> logged into every machine in the cluster; each machine's
>>> mapred-site.xml has the same md5sum.
>>>
>>> On Wed, Feb 16, 2011 at 10:15 AM, James Seigel <[email protected]> wrote:
>>>> He might not have that conf distributed out to each machine
>>>>
>>>>
>>>> Sent from my mobile. Please excuse the typos.
>>>>
>>>> On 2011-02-16, at 9:10 AM, Kelly Burkhart <[email protected]> wrote:
>>>>
>>>>> Our clust admin (who's out of town today) has mapred.child.java.opts
>>>>> set to -Xmx1280 in mapred-site.xml.  However, if I go to the job
>>>>> configuration page for a job I'm running right now, it claims this
>>>>> option is set to -Xmx200m.  There are other settings in
>>>>> mapred-site.xml that are different too.  Why would map/reduce jobs not
>>>>> respect the mapred-site.xml file?
>>>>>
>>>>> -K
>>>>>
>>>>> On Wed, Feb 16, 2011 at 9:43 AM, Jim Falgout <[email protected]> 
>>>>> wrote:
>>>>>> You can set the amount of memory used by the reducer using the 
>>>>>> mapreduce.reduce.java.opts property. Set it in mapred-site.xml or 
>>>>>> override it in your job. You can set it to something like: -Xm512M to 
>>>>>> increase the amount of memory used by the JVM spawned for the reducer 
>>>>>> task.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Kelly Burkhart [mailto:[email protected]]
>>>>>> Sent: Wednesday, February 16, 2011 9:12 AM
>>>>>> To: [email protected]
>>>>>> Subject: Re: Reduce java.lang.OutOfMemoryError
>>>>>>
>>>>>> I have had it fail with a single reducer and with 100 reducers.
>>>>>> Ultimately it needs to be funneled to a single reducer though.
>>>>>>
>>>>>> -K
>>>>>>
>>>>>> On Wed, Feb 16, 2011 at 9:02 AM, real great..
>>>>>> <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>> How many reducers are you using currently?
>>>>>>> Try increasing the number or reducers.
>>>>>>> Let me know if it helps.
>>>>>>>
>>>>>>> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>> Hello, I'm seeing frequent fails in reduce jobs with errors similar
>>>>>>>> to
>>>>>>>> this:
>>>>>>>>
>>>>>>>>
>>>>>>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>>> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492,
>>>>>>>> decompressed len: 172488
>>>>>>>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner:
>>>>>>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf
>>>>>>>> fleInMemory(ReduceTask.java:1508)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>>>>>>> apOutput(ReduceTask.java:1408)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>>>>>>> Output(ReduceTask.java:1261)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>>>>>>> ReduceTask.java:1195)
>>>>>>>>
>>>>>>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>>> Shuffling 172488 bytes (172492 raw bytes) into RAM from
>>>>>>>> attempt_201102081823_0175_m_002153_0
>>>>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>>> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944,
>>>>>>>> decompressed len: 161940
>>>>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>>> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365,
>>>>>>>> decompressed len: 228361
>>>>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>>> Task
>>>>>>>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from
>>>>>>>> attempt_201102081823_0175_m_002153_0
>>>>>>>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner:
>>>>>>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf
>>>>>>>> fleInMemory(ReduceTask.java:1508)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>>>>>>> apOutput(ReduceTask.java:1408)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>>>>>>> Output(ReduceTask.java:1261)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>>>>>>> ReduceTask.java:1195)
>>>>>>>>
>>>>>>>> Some also show this:
>>>>>>>>
>>>>>>>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>>>>        at
>>>>>>>> sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63)
>>>>>>>>        at
>>>>>>>> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811)
>>>>>>>>        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>>>>>>>>        at
>>>>>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
>>>>>>>> nection.java:1072)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getI
>>>>>>>> nputStream(ReduceTask.java:1447)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>>>>>>> apOutput(ReduceTask.java:1349)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>>>>>>> Output(ReduceTask.java:1261)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>>>>>>> ReduceTask.java:1195)
>>>>>>>>
>>>>>>>> The particular job I'm running is an attempt to merge multiple time
>>>>>>>> series files into a single file.  The job tracker shows the following:
>>>>>>>>
>>>>>>>>
>>>>>>>> Kind    Num Tasks    Complete   Killed    Failed/Killed Task Attempts
>>>>>>>> map     15795        15795      0         0 / 29 reduce  100
>>>>>>>> 30         70        17 / 29
>>>>>>>>
>>>>>>>> All of the files I'm reading have records with a timestamp key similar 
>>>>>>>> to:
>>>>>>>>
>>>>>>>> 2011-01-03 08:30:00.457000<tab><record>
>>>>>>>>
>>>>>>>> My map job is a simple python program that ignores rows with times <
>>>>>>>> 08:30:00 and > 15:00:00, determines the type of input row and writes
>>>>>>>> it to stdout with very minor modification.  It maintains no state and
>>>>>>>> should not use any significant memory.  My reducer is the
>>>>>>>> IdentityReducer.  The input files are individually gzipped then put
>>>>>>>> into hdfs.  The total uncompressed size of the output should be
>>>>>>>> around 150G.  Our cluster is 32 nodes each of which has 16G RAM and
>>>>>>>> most of which have two 2T drives.  We're running hadoop 0.20.2.
>>>>>>>>
>>>>>>>>
>>>>>>>> Can anyone provide some insight on how we can eliminate this issue?
>>>>>>>> I'm certain this email does not provide enough info, please let me
>>>>>>>> know what further information is needed to troubleshoot.
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> -Kelly
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> R.V.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>

Reply via email to