Our clust admin (who's out of town today) has mapred.child.java.opts set to -Xmx1280 in mapred-site.xml. However, if I go to the job configuration page for a job I'm running right now, it claims this option is set to -Xmx200m. There are other settings in mapred-site.xml that are different too. Why would map/reduce jobs not respect the mapred-site.xml file?
-K On Wed, Feb 16, 2011 at 9:43 AM, Jim Falgout <[email protected]> wrote: > You can set the amount of memory used by the reducer using the > mapreduce.reduce.java.opts property. Set it in mapred-site.xml or override it > in your job. You can set it to something like: -Xm512M to increase the amount > of memory used by the JVM spawned for the reducer task. > > -----Original Message----- > From: Kelly Burkhart [mailto:[email protected]] > Sent: Wednesday, February 16, 2011 9:12 AM > To: [email protected] > Subject: Re: Reduce java.lang.OutOfMemoryError > > I have had it fail with a single reducer and with 100 reducers. > Ultimately it needs to be funneled to a single reducer though. > > -K > > On Wed, Feb 16, 2011 at 9:02 AM, real great.. > <[email protected]> wrote: >> Hi, >> How many reducers are you using currently? >> Try increasing the number or reducers. >> Let me know if it helps. >> >> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart >> <[email protected]>wrote: >> >>> Hello, I'm seeing frequent fails in reduce jobs with errors similar >>> to >>> this: >>> >>> >>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: >>> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492, >>> decompressed len: 172488 >>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner: >>> attempt_201102081823_0175_r_000034_0 : Map output copy failure : >>> java.lang.OutOfMemoryError: Java heap space >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf >>> fleInMemory(ReduceTask.java:1508) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM >>> apOutput(ReduceTask.java:1408) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy >>> Output(ReduceTask.java:1261) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run( >>> ReduceTask.java:1195) >>> >>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: >>> Shuffling 172488 bytes (172492 raw bytes) into RAM from >>> attempt_201102081823_0175_m_002153_0 >>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: >>> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944, >>> decompressed len: 161940 >>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: >>> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365, >>> decompressed len: 228361 >>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: >>> Task >>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from >>> attempt_201102081823_0175_m_002153_0 >>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner: >>> attempt_201102081823_0175_r_000034_0 : Map output copy failure : >>> java.lang.OutOfMemoryError: Java heap space >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf >>> fleInMemory(ReduceTask.java:1508) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM >>> apOutput(ReduceTask.java:1408) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy >>> Output(ReduceTask.java:1261) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run( >>> ReduceTask.java:1195) >>> >>> Some also show this: >>> >>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded >>> at >>> sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63) >>> at >>> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811) >>> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) >>> at >>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon >>> nection.java:1072) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getI >>> nputStream(ReduceTask.java:1447) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM >>> apOutput(ReduceTask.java:1349) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy >>> Output(ReduceTask.java:1261) >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run( >>> ReduceTask.java:1195) >>> >>> The particular job I'm running is an attempt to merge multiple time >>> series files into a single file. The job tracker shows the following: >>> >>> >>> Kind Num Tasks Complete Killed Failed/Killed Task Attempts >>> map 15795 15795 0 0 / 29 reduce 100 >>> 30 70 17 / 29 >>> >>> All of the files I'm reading have records with a timestamp key similar to: >>> >>> 2011-01-03 08:30:00.457000<tab><record> >>> >>> My map job is a simple python program that ignores rows with times < >>> 08:30:00 and > 15:00:00, determines the type of input row and writes >>> it to stdout with very minor modification. It maintains no state and >>> should not use any significant memory. My reducer is the >>> IdentityReducer. The input files are individually gzipped then put >>> into hdfs. The total uncompressed size of the output should be >>> around 150G. Our cluster is 32 nodes each of which has 16G RAM and >>> most of which have two 2T drives. We're running hadoop 0.20.2. >>> >>> >>> Can anyone provide some insight on how we can eliminate this issue? >>> I'm certain this email does not provide enough info, please let me >>> know what further information is needed to troubleshoot. >>> >>> Thanks in advance, >>> >>> -Kelly >>> >> >> >> >> -- >> Regards, >> R.V. >> > > >
