Which version of hadoop are you running? > > Hadoop 0.21.0 with some patches.
> Are you running on linux? > > Yes Linux 2.6.18-238.9.1.el5 #1 SMP x86_64 x86_64 x86_64 GNU/Linux java version "1.6.0_21" Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode) I set up 0.21.0 on another linux box and am not seeing this issue as hadoop is reusing JVMs(as configured). In the production cluster it is not re-using JVMs and runs out of memory because of mapper JVMs staying alive even after they have ended according to hadoop. The production node is a 64 bit OS/JVM. Is there any known issue workaround for enabling JVM reuse in 64 bit environments. Test node is 32 bit: Linux 2.6.18-194.32.1.el5.centos.plus #1 SMP i686 i686 i386 GNU/Linux java version "1.6.0_17" OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-i386) OpenJDK Server VM (build 14.0-b16, mixed mode) Even if I can get it to reuse JVM it will be grrreat. -Adi > -Joey > > On Thu, May 12, 2011 at 1:39 PM, Adi <adi.pan...@gmail.com> wrote: > > For one long running job we are noticing that the mapper jvms do not exit > > even after the mapper is done. Any suggestions on why this could be > > happening. > > The java processes get cleaned up if I do a hadoop job -kill <job_id>. > The > > java processes get cleaned up of I run in it in a smaller batch and the > job > > gets done fairly quickly(say half an hour). For larger inputs the nodes > > eventually run out of memory because of these java processes that the > > cluster thinks are gone but they haven't been cleaned up yet. I am > > suspecting the TaskTrackers are failing to kill JVMs for some reason by > > themselves. > > The following exceptions can be seen in the hadoop logs. > > > > 2011-05-12 13:52:04,147 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such > process > > 2011-05-12 13:52:08,071 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -11061: No such > process > > 2011-05-12 13:52:09,009 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -11151: No such > process > > 2011-05-12 13:52:12,009 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -25057: No such > process > > 2011-05-12 13:52:13,306 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -19805: No such > process > > 2011-05-12 13:52:14,996 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -11103: No such > process > > > > 2011-05-12 15:51:41,105 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -17202: No such > process > > 2011-05-12 15:51:43,481 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -15981: No such > process > > 2011-05-12 15:51:45,916 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -17931: No such > process > > 2011-05-12 15:52:06,328 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -14867: No such > process > > 2011-05-12 15:52:34,503 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -29376: No such > process > > 2011-05-12 15:52:38,607 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -32491: No such > process > > 2011-05-12 15:52:39,292 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -31529: No such > process > > 2011-05-12 15:52:46,547 WARN > org.apache.hadoop.mapreduce.util.ProcessTree: > > Error executing shell command > > org.apache.hadoop.util.Shell$ExitCodeException: kill -15140: No such > process > > > > Some other exceptions also seen in the logs may or may not be related to > the > > above problem. > > 2011-05-12 16:01:20,534 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 6 on 33465 caught: java.nio.channels.ClosedChannelException > > 2011-05-12 16:01:48,869 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 80 on 33465 caught: java.nio.channels.ClosedChannelException > > 2011-05-12 16:01:53,922 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 59 on 33465 caught: java.nio.channels.ClosedChannelException > > 2011-05-12 16:01:58,977 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 28 on 33465 caught: java.nio.channels.ClosedChannelException > > 2011-05-12 16:02:04,040 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 37 on 33465 caught: java.nio.channels.ClosedChannelException > > 2011-05-12 16:02:09,095 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 100 on 33465 caught: java.nio.channels.ClosedChannelException > > > > Thanks. > > > > -Adi > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >