It's an out of mem error, so I feel it has to do with ram rather tham disk space. Did you check if it's swapping? (top/htop)... Is your reduce phase very mem-intensive? Seems to be a memory leak somewhere.. What does htop say? What processes are you running on each node? What does the log file that it is showing say?
On Sat, Nov 6, 2010 at 2:36 PM, Shavit Netzer <[email protected]> wrote: > 7GB > > Sent from my mobile > > On 06/11/2010, at 11:00, "Hari Sreekumar" <[email protected] > <mailto:[email protected]>> wrote: > > What's the RAM on each node? > > On Sat, Nov 6, 2010 at 11:03 AM, Shavit Netzer <[email protected]<mailto: > [email protected]>> wrote: > > Hello, > > I have a question regarding MapRed jobs. > > I have 24 nodes, each node have 4 disks (mnt – mnt3), 500GB each mnt. > > All balanced ( I used the balancer, except mnt, which have 97% used). > > My question is: > I got the following error and I relate it to the disk space (maybe I'm > wrong). > > Maybe there is a configuration that I can add, change in order to have few > more retries on separate disk: > > > 10/10/27 21:59:01 INFO mapred.JobClient: map 100% reduce 26% > > 10/10/27 21:59:02 INFO mapred.JobClient: Task Id : > attempt_201010201240_4059_r_000023_0, Status : FAILED > > java.io.IOException: Task process exit with nonzero status of 134. > > at > org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) > > at > org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) > > > > attempt_201010201240_4059_r_000023_0: # > > attempt_201010201240_4059_r_000023_0: # A fatal error has been detected by > the Java Runtime Environment: > > attempt_201010201240_4059_r_000023_0: # > > attempt_201010201240_4059_r_000023_0: # java.lang.OutOfMemoryError: > requested 32744 bytes for ChunkPool::allocate. Out of swap space? > > attempt_201010201240_4059_r_000023_0: # > > attempt_201010201240_4059_r_000023_0: # Internal Error > (allocation.cpp:117), pid=15974, tid=1089702224 > > attempt_201010201240_4059_r_000023_0: # Error: ChunkPool::allocate > > attempt_201010201240_4059_r_000023_0: # > > attempt_201010201240_4059_r_000023_0: # JRE version: 6.0_14-b08 > > attempt_201010201240_4059_r_000023_0: # Java VM: Java HotSpot(TM) 64-Bit > Server VM (14.0-b16 mixed mode linux-amd64 ) > > attempt_201010201240_4059_r_000023_0: # An error report file with more > information is saved as: > > attempt_201010201240_4059_r_000023_0: # > > /mnt2/hadoop/mapred/local/taskTracker/jobcache/job_201010201240_4059/attempt_201010201240_4059_r_000023_0/work/hs_err_pid15974.log > > attempt_201010201240_4059_r_000023_0: # > > attempt_201010201240_4059_r_000023_0: # If you would like to submit a bug > report, please visit: > > attempt_201010201240_4059_r_000023_0: # > http://java.sun.com/webapps/bugreport/crash.jsp > > attempt_201010201240_4059_r_000023_0: # > > Regards, > Shavit > >
