Re: Task process exit with nonzero status of 134

Christian Kirschbaum Tue, 28 Jul 2009 07:01:02 -0700

Hi Todd et al.,

coming back to this again, I'd like to present a solution we found andthat indeed a JVM bug was the cause of seeing exit code 134 on theTaskRunner.

First of all, we had to configure the Hadoop subsystem to start with thefollowing parameter:


-XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log

This was necessary, because without it, the JVM would -- by default --put this standard logfile into the current working directory, which inthis case was the Hadoop task working directory. This directory,however, got removed upon job failing or completion.

The java error logfile pointed us to a specific class and method thatkept crashing the JVM, namely:DefaultSDContextGenerator.previousSpaceIndex(CharSequence, int): int

We eventually googled for this specific class and method, and lo andbehold, found this:

http://sourceforge.net/tracker/?func=detail&aid=2793972&group_id=3368&atid=103368

Apparently, this specific class and method had triggered JVM crashes forother users as well. We implemented the workaround code and the troublewith exit code 134 was finally gone.

On that webpage, someone posted in the comments a code snippet toreproduce the JVM crash. I have not yet confirmed whether it wasreported to Sun as well.


Cheers,
Chris


Todd Lipcon schrieb:

Hi Christian,

Generally along with a nonzero exit code you should see something inthe stderr for that attempt. If you look on the TaskTracker insidelogs/userlogs/attempt_<the failed attempt>/stderr do you see anythinguseful?

If it's a segfault or a linux OOM kill, you should also see somethingin your system's kernel log. Check "dmesg" and/or /var/log/kern.logfor anything suspicious looking.


Hope that helps
-Todd

On Tue, Jul 21, 2009 at 2:15 AM, Christian Kirschbaum<[email protected]<mailto:[email protected]>> wrote:


    Hi all,

    we're using Hadoop 0.19.1 and have recently encountered the
    following erratic problem when running jobs involving UIMA text
    annotation chains (which fail frequently because of this):

    java.io.IOException: Task process exit with nonzero status of 134.
           at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)


    As you can see, this is propagated in Hadoop code, without the
    actual MapReduce job being able to react to it. Unfortunately,
    this exception message isn't very descriptive as to the actual
    cause which I have yet to track down.

    All I found out is that this status code apparently is an exit
    code of a separate process initiated through
    org.apache.hadoop.util.Shell.ShellCommandExecutor in the
    runChild(JvmEnv) method of org.apache.hadoop.mapred.JvmManager.
    And because it is exit code 134 (128 + 6), supposedly signal 6
    (ABORT) has effected the process termination which may indicate a
    core dump?

    How do I find out more about the actual cause? Is there any secret
    logfile for the separately spawned Jvm process? I've looked
    through various logs and userlogs directories but could not find
    any mention of this exception there.

    Any help is appreciated.

    Thanks,
    Chris



--
Christian Kirschbaum
Software Developer
--------------------------------------------------------
vionto GmbH
Karl-Marx-Allee 90a, D-10243 Berlin

fon   +49 30 40 20 329 - 27
fax   +49 30 40 20 329 - 01
web   http://www.vionto.com
--------------------------------------------------------
Geschäftsführer: Ralf von Grafenstein, Dr. Martin Hirsch
Sitz der Gesellschaft: Berlin
Amtsgericht Berlin Charlottenburg, HRB 108154B
--------------------------------------------------------

Re: Task process exit with nonzero status of 134

Reply via email to