Re： large reducer output with same key

gang Thu, 31 Dec 2009 22:53:23 -0800

Hi,

It seems the error not happen at the reduce output but the map output.It is likely the local file system of some node running the map taskdoesn't have enough space for the map output. If there are lots ofitems sharing the same key, you can use combiner at the map phasebefore output the intermediate result.



-Gang



----- Original Message ----
From: himanshu chandola <[email protected]>
To: [email protected]
Sent: 2009/12/31  5:10:10
Subject: large reducer output with same key

Hi Everyone,

My reducer output results in most of the data having the same key. Thereducer output is close to 16 GB and though my cluster in total has aterabyte of space in hdfs I get errors like the following :

org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
       at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:
Could not find any valid local directory for
task_200808021906_0002_m_000014_2/spill4.out

After such failures, hadoop tries to start the same reduce job coupletimes on other nodes before the job fails. From the

exception, it looks to me this is

probably a disk error(some machines have less than 16 gigs free spaceon hdfs).

So my question was whether hadoop puts values which share the same keyas a single block in one node ? Or something else

could be happening here ?

Thanks

H

Re： large reducer output with same key

Reply via email to