Youch! Thanks for the followup. Confirmed that that Crasher.java snippet crashes my Sun 1.6.0 update 14 as well. Hopefully they'll get it fixed in update 15
-Todd On Tue, Jul 28, 2009 at 7:00 AM, Christian Kirschbaum < [email protected]> wrote: > Hi Todd et al., > > coming back to this again, I'd like to present a solution we found and that > indeed a JVM bug was the cause of seeing exit code 134 on the TaskRunner. > > First of all, we had to configure the Hadoop subsystem to start with the > following parameter: > > -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log > > This was necessary, because without it, the JVM would -- by default -- put > this standard logfile into the current working directory, which in this case > was the Hadoop task working directory. This directory, however, got removed > upon job failing or completion. > > The java error logfile pointed us to a specific class and method that kept > crashing the JVM, namely: > DefaultSDContextGenerator.previousSpaceIndex(CharSequence, int): int > > We eventually googled for this specific class and method, and lo and > behold, found this: > > http://sourceforge.net/tracker/?func=detail&aid=2793972&group_id=3368&atid=103368 > > Apparently, this specific class and method had triggered JVM crashes for > other users as well. We implemented the workaround code and the trouble with > exit code 134 was finally gone. > > On that webpage, someone posted in the comments a code snippet to reproduce > the JVM crash. I have not yet confirmed whether it was reported to Sun as > well. > > Cheers, > Chris > > > Todd Lipcon schrieb: > >> Hi Christian, >> >> Generally along with a nonzero exit code you should see something in the >> stderr for that attempt. If you look on the TaskTracker inside >> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything >> useful? >> >> If it's a segfault or a linux OOM kill, you should also see something in >> your system's kernel log. Check "dmesg" and/or /var/log/kern.log for >> anything suspicious looking. >> >> Hope that helps >> -Todd >> >> On Tue, Jul 21, 2009 at 2:15 AM, Christian Kirschbaum < >> [email protected] <mailto:[email protected]>> >> wrote: >> >> Hi all, >> >> we're using Hadoop 0.19.1 and have recently encountered the >> following erratic problem when running jobs involving UIMA text >> annotation chains (which fail frequently because of this): >> >> java.io.IOException: Task process exit with nonzero status of 134. >> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425) >> >> >> As you can see, this is propagated in Hadoop code, without the >> actual MapReduce job being able to react to it. Unfortunately, >> this exception message isn't very descriptive as to the actual >> cause which I have yet to track down. >> >> All I found out is that this status code apparently is an exit >> code of a separate process initiated through >> org.apache.hadoop.util.Shell.ShellCommandExecutor in the >> runChild(JvmEnv) method of org.apache.hadoop.mapred.JvmManager. >> And because it is exit code 134 (128 + 6), supposedly signal 6 >> (ABORT) has effected the process termination which may indicate a >> core dump? >> >> How do I find out more about the actual cause? Is there any secret >> logfile for the separately spawned Jvm process? I've looked >> through various logs and userlogs directories but could not find >> any mention of this exception there. >> >> Any help is appreciated. >> >> Thanks, >> Chris >> >> >> >> > > -- > Christian Kirschbaum > Software Developer > -------------------------------------------------------- > vionto GmbH > Karl-Marx-Allee 90a, D-10243 Berlin > > fon +49 30 40 20 329 - 27 > fax +49 30 40 20 329 - 01 > web http://www.vionto.com > -------------------------------------------------------- > Geschäftsführer: Ralf von Grafenstein, Dr. Martin Hirsch > Sitz der Gesellschaft: Berlin > Amtsgericht Berlin Charlottenburg, HRB 108154B > -------------------------------------------------------- > >
