another possibility could be increasing the memory allocated to jvm..not sure how to do it though.
On Wed, Feb 16, 2011 at 8:46 PM, James Seigel <[email protected]> wrote: > Well the first thing I'd ask to see (if we can) is the code or a > description of what your reducer is doing. > > If it is holding on to objects too long or accumulating lists well > then with the right amount of data you will run OOM. > > Another thought is that you've just not allocated enough mem for the > reducer to run properly anyway. Try passing in a setting for the > reducer that ups the memory for it. 768 perhaps. > > James > > Sent from my mobile. Please excuse the typos. > > On 2011-02-16, at 8:12 AM, Kelly Burkhart <[email protected]> > wrote: > > > I have had it fail with a single reducer and with 100 reducers. > > Ultimately it needs to be funneled to a single reducer though. > > > > -K > > > > On Wed, Feb 16, 2011 at 9:02 AM, real great.. > > <[email protected]> wrote: > >> Hi, > >> How many reducers are you using currently? > >> Try increasing the number or reducers. > >> Let me know if it helps. > >> > >> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart < > [email protected]>wrote: > >> > >>> Hello, I'm seeing frequent fails in reduce jobs with errors similar to > >>> this: > >>> > >>> > >>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: > >>> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492, > >>> decompressed len: 172488 > >>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner: > >>> attempt_201102081823_0175_r_000034_0 : Map output copy failure : > >>> java.lang.OutOfMemoryError: Java heap space > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) > >>> > >>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: > >>> Shuffling 172488 bytes (172492 raw bytes) into RAM from > >>> attempt_201102081823_0175_m_002153_0 > >>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: > >>> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944, > >>> decompressed len: 161940 > >>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: > >>> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365, > >>> decompressed len: 228361 > >>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: Task > >>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from > >>> attempt_201102081823_0175_m_002153_0 > >>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner: > >>> attempt_201102081823_0175_r_000034_0 : Map output copy failure : > >>> java.lang.OutOfMemoryError: Java heap space > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) > >>> > >>> Some also show this: > >>> > >>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded > >>> at > sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63) > >>> at > sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811) > >>> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) > >>> at > >>> > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) > >>> at > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) > >>> > >>> The particular job I'm running is an attempt to merge multiple time > >>> series files into a single file. The job tracker shows the following: > >>> > >>> > >>> Kind Num Tasks Complete Killed Failed/Killed Task Attempts > >>> map 15795 15795 0 0 / 29 > >>> reduce 100 30 70 17 / 29 > >>> > >>> All of the files I'm reading have records with a timestamp key similar > to: > >>> > >>> 2011-01-03 08:30:00.457000<tab><record> > >>> > >>> My map job is a simple python program that ignores rows with times < > >>> 08:30:00 and > 15:00:00, determines the type of input row and writes > >>> it to stdout with very minor modification. It maintains no state and > >>> should not use any significant memory. My reducer is the > >>> IdentityReducer. The input files are individually gzipped then put > >>> into hdfs. The total uncompressed size of the output should be around > >>> 150G. Our cluster is 32 nodes each of which has 16G RAM and most of > >>> which have two 2T drives. We're running hadoop 0.20.2. > >>> > >>> > >>> Can anyone provide some insight on how we can eliminate this issue? > >>> I'm certain this email does not provide enough info, please let me > >>> know what further information is needed to troubleshoot. > >>> > >>> Thanks in advance, > >>> > >>> -Kelly > >>> > >> > >> > >> > >> -- > >> Regards, > >> R.V. > >> > -- Regards, R.V.
