He might not have that conf distributed out to each machine
Sent from my mobile. Please excuse the typos. On 2011-02-16, at 9:10 AM, Kelly Burkhart <[email protected]> wrote: > Our clust admin (who's out of town today) has mapred.child.java.opts > set to -Xmx1280 in mapred-site.xml. However, if I go to the job > configuration page for a job I'm running right now, it claims this > option is set to -Xmx200m. There are other settings in > mapred-site.xml that are different too. Why would map/reduce jobs not > respect the mapred-site.xml file? > > -K > > On Wed, Feb 16, 2011 at 9:43 AM, Jim Falgout <[email protected]> > wrote: >> You can set the amount of memory used by the reducer using the >> mapreduce.reduce.java.opts property. Set it in mapred-site.xml or override >> it in your job. You can set it to something like: -Xm512M to increase the >> amount of memory used by the JVM spawned for the reducer task. >> >> -----Original Message----- >> From: Kelly Burkhart [mailto:[email protected]] >> Sent: Wednesday, February 16, 2011 9:12 AM >> To: [email protected] >> Subject: Re: Reduce java.lang.OutOfMemoryError >> >> I have had it fail with a single reducer and with 100 reducers. >> Ultimately it needs to be funneled to a single reducer though. >> >> -K >> >> On Wed, Feb 16, 2011 at 9:02 AM, real great.. >> <[email protected]> wrote: >>> Hi, >>> How many reducers are you using currently? >>> Try increasing the number or reducers. >>> Let me know if it helps. >>> >>> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart >>> <[email protected]>wrote: >>> >>>> Hello, I'm seeing frequent fails in reduce jobs with errors similar >>>> to >>>> this: >>>> >>>> >>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: >>>> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492, >>>> decompressed len: 172488 >>>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner: >>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure : >>>> java.lang.OutOfMemoryError: Java heap space >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf >>>> fleInMemory(ReduceTask.java:1508) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM >>>> apOutput(ReduceTask.java:1408) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy >>>> Output(ReduceTask.java:1261) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run( >>>> ReduceTask.java:1195) >>>> >>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: >>>> Shuffling 172488 bytes (172492 raw bytes) into RAM from >>>> attempt_201102081823_0175_m_002153_0 >>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: >>>> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944, >>>> decompressed len: 161940 >>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: >>>> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365, >>>> decompressed len: 228361 >>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: >>>> Task >>>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from >>>> attempt_201102081823_0175_m_002153_0 >>>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner: >>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure : >>>> java.lang.OutOfMemoryError: Java heap space >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf >>>> fleInMemory(ReduceTask.java:1508) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM >>>> apOutput(ReduceTask.java:1408) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy >>>> Output(ReduceTask.java:1261) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run( >>>> ReduceTask.java:1195) >>>> >>>> Some also show this: >>>> >>>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded >>>> at >>>> sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63) >>>> at >>>> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811) >>>> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) >>>> at >>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon >>>> nection.java:1072) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getI >>>> nputStream(ReduceTask.java:1447) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM >>>> apOutput(ReduceTask.java:1349) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy >>>> Output(ReduceTask.java:1261) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run( >>>> ReduceTask.java:1195) >>>> >>>> The particular job I'm running is an attempt to merge multiple time >>>> series files into a single file. The job tracker shows the following: >>>> >>>> >>>> Kind Num Tasks Complete Killed Failed/Killed Task Attempts >>>> map 15795 15795 0 0 / 29 reduce 100 >>>> 30 70 17 / 29 >>>> >>>> All of the files I'm reading have records with a timestamp key similar to: >>>> >>>> 2011-01-03 08:30:00.457000<tab><record> >>>> >>>> My map job is a simple python program that ignores rows with times < >>>> 08:30:00 and > 15:00:00, determines the type of input row and writes >>>> it to stdout with very minor modification. It maintains no state and >>>> should not use any significant memory. My reducer is the >>>> IdentityReducer. The input files are individually gzipped then put >>>> into hdfs. The total uncompressed size of the output should be >>>> around 150G. Our cluster is 32 nodes each of which has 16G RAM and >>>> most of which have two 2T drives. We're running hadoop 0.20.2. >>>> >>>> >>>> Can anyone provide some insight on how we can eliminate this issue? >>>> I'm certain this email does not provide enough info, please let me >>>> know what further information is needed to troubleshoot. >>>> >>>> Thanks in advance, >>>> >>>> -Kelly >>>> >>> >>> >>> >>> -- >>> Regards, >>> R.V. >>> >> >> >>
