Which is why setting cluster values to final helps. See http://wiki.apache.org/hadoop/FAQ#How_do_I_get_my_MapReduce_Java_Program_to_read_the_Cluster.27s_set_configuration_and_not_just_defaults.3F
On Wed, Feb 16, 2011 at 11:41 PM, Kelly Burkhart <[email protected]> wrote: > OK, the job was preferring the config file on my local machine which > is not part of the cluster over the cluster config files. That seems > completely broken to me; my config was basically empty other than > containing the location of the cluster and my job apparently used > defaults rather than the cluster config. It doesn't make sense to me > to keep configuration files synchronized on every machine that may > access the cluster. > > I'm running again; we'll see if it completes this time. > > -K > > On Wed, Feb 16, 2011 at 10:30 AM, James Seigel <[email protected]> wrote: >> Hrmmm. Well as you've pointed out. 200m is quite small and is probably >> the cause. >> >> Now thEre might be some overriding settings in something you are using >> to launch or something. >> >> You could set those values in the config to not be overridden in the >> main conf then see what tries to override it in the logs >> >> Cheers >> James >> >> Sent from my mobile. Please excuse the typos. >> >> On 2011-02-16, at 9:21 AM, Kelly Burkhart <[email protected]> wrote: >> >>> I should have mentioned this in my last email: I thought of that so I >>> logged into every machine in the cluster; each machine's >>> mapred-site.xml has the same md5sum. >>> >>> On Wed, Feb 16, 2011 at 10:15 AM, James Seigel <[email protected]> wrote: >>>> He might not have that conf distributed out to each machine >>>> >>>> >>>> Sent from my mobile. Please excuse the typos. >>>> >>>> On 2011-02-16, at 9:10 AM, Kelly Burkhart <[email protected]> wrote: >>>> >>>>> Our clust admin (who's out of town today) has mapred.child.java.opts >>>>> set to -Xmx1280 in mapred-site.xml. However, if I go to the job >>>>> configuration page for a job I'm running right now, it claims this >>>>> option is set to -Xmx200m. There are other settings in >>>>> mapred-site.xml that are different too. Why would map/reduce jobs not >>>>> respect the mapred-site.xml file? >>>>> >>>>> -K >>>>> >>>>> On Wed, Feb 16, 2011 at 9:43 AM, Jim Falgout <[email protected]> >>>>> wrote: >>>>>> You can set the amount of memory used by the reducer using the >>>>>> mapreduce.reduce.java.opts property. Set it in mapred-site.xml or >>>>>> override it in your job. You can set it to something like: -Xm512M to >>>>>> increase the amount of memory used by the JVM spawned for the reducer >>>>>> task. >>>>>> >>>>>> -----Original Message----- >>>>>> From: Kelly Burkhart [mailto:[email protected]] >>>>>> Sent: Wednesday, February 16, 2011 9:12 AM >>>>>> To: [email protected] >>>>>> Subject: Re: Reduce java.lang.OutOfMemoryError >>>>>> >>>>>> I have had it fail with a single reducer and with 100 reducers. >>>>>> Ultimately it needs to be funneled to a single reducer though. >>>>>> >>>>>> -K >>>>>> >>>>>> On Wed, Feb 16, 2011 at 9:02 AM, real great.. >>>>>> <[email protected]> wrote: >>>>>>> Hi, >>>>>>> How many reducers are you using currently? >>>>>>> Try increasing the number or reducers. >>>>>>> Let me know if it helps. >>>>>>> >>>>>>> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> Hello, I'm seeing frequent fails in reduce jobs with errors similar >>>>>>>> to >>>>>>>> this: >>>>>>>> >>>>>>>> >>>>>>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: >>>>>>>> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492, >>>>>>>> decompressed len: 172488 >>>>>>>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner: >>>>>>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure : >>>>>>>> java.lang.OutOfMemoryError: Java heap space >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf >>>>>>>> fleInMemory(ReduceTask.java:1508) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM >>>>>>>> apOutput(ReduceTask.java:1408) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy >>>>>>>> Output(ReduceTask.java:1261) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run( >>>>>>>> ReduceTask.java:1195) >>>>>>>> >>>>>>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: >>>>>>>> Shuffling 172488 bytes (172492 raw bytes) into RAM from >>>>>>>> attempt_201102081823_0175_m_002153_0 >>>>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: >>>>>>>> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944, >>>>>>>> decompressed len: 161940 >>>>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: >>>>>>>> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365, >>>>>>>> decompressed len: 228361 >>>>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: >>>>>>>> Task >>>>>>>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from >>>>>>>> attempt_201102081823_0175_m_002153_0 >>>>>>>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner: >>>>>>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure : >>>>>>>> java.lang.OutOfMemoryError: Java heap space >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf >>>>>>>> fleInMemory(ReduceTask.java:1508) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM >>>>>>>> apOutput(ReduceTask.java:1408) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy >>>>>>>> Output(ReduceTask.java:1261) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run( >>>>>>>> ReduceTask.java:1195) >>>>>>>> >>>>>>>> Some also show this: >>>>>>>> >>>>>>>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded >>>>>>>> at >>>>>>>> sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63) >>>>>>>> at >>>>>>>> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811) >>>>>>>> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) >>>>>>>> at >>>>>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon >>>>>>>> nection.java:1072) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getI >>>>>>>> nputStream(ReduceTask.java:1447) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM >>>>>>>> apOutput(ReduceTask.java:1349) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy >>>>>>>> Output(ReduceTask.java:1261) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run( >>>>>>>> ReduceTask.java:1195) >>>>>>>> >>>>>>>> The particular job I'm running is an attempt to merge multiple time >>>>>>>> series files into a single file. The job tracker shows the following: >>>>>>>> >>>>>>>> >>>>>>>> Kind Num Tasks Complete Killed Failed/Killed Task Attempts >>>>>>>> map 15795 15795 0 0 / 29 reduce 100 >>>>>>>> 30 70 17 / 29 >>>>>>>> >>>>>>>> All of the files I'm reading have records with a timestamp key similar >>>>>>>> to: >>>>>>>> >>>>>>>> 2011-01-03 08:30:00.457000<tab><record> >>>>>>>> >>>>>>>> My map job is a simple python program that ignores rows with times < >>>>>>>> 08:30:00 and > 15:00:00, determines the type of input row and writes >>>>>>>> it to stdout with very minor modification. It maintains no state and >>>>>>>> should not use any significant memory. My reducer is the >>>>>>>> IdentityReducer. The input files are individually gzipped then put >>>>>>>> into hdfs. The total uncompressed size of the output should be >>>>>>>> around 150G. Our cluster is 32 nodes each of which has 16G RAM and >>>>>>>> most of which have two 2T drives. We're running hadoop 0.20.2. >>>>>>>> >>>>>>>> >>>>>>>> Can anyone provide some insight on how we can eliminate this issue? >>>>>>>> I'm certain this email does not provide enough info, please let me >>>>>>>> know what further information is needed to troubleshoot. >>>>>>>> >>>>>>>> Thanks in advance, >>>>>>>> >>>>>>>> -Kelly >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> R.V. >>>>>>> >>>>>> >>>>>> >>>>>> >>>> >> > -- Harsh J www.harshj.com
