Re: Task process exit with nonzero status of 134

Todd Lipcon Tue, 28 Jul 2009 10:30:05 -0700

Youch! Thanks for the followup. Confirmed that that Crasher.java snippet
crashes my Sun 1.6.0 update 14 as well. Hopefully they'll get it fixed in
update 15


-Todd

On Tue, Jul 28, 2009 at 7:00 AM, Christian Kirschbaum <
[email protected]> wrote:

> Hi Todd et al.,
>
> coming back to this again, I'd like to present a solution we found and that
> indeed a JVM bug was the cause of seeing exit code 134 on the TaskRunner.
>
> First of all, we had to configure the Hadoop subsystem to start with the
> following parameter:
>
> -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log
>
> This was necessary, because without it, the JVM would -- by default -- put
> this standard logfile into the current working directory, which in this case
> was the Hadoop task working directory. This directory, however, got removed
> upon job failing or completion.
>
> The java error logfile pointed us to a specific class and method that kept
> crashing the JVM, namely:
> DefaultSDContextGenerator.previousSpaceIndex(CharSequence, int): int
>
> We eventually googled for this specific class and method, and lo and
> behold, found this:
>
> http://sourceforge.net/tracker/?func=detail&aid=2793972&group_id=3368&atid=103368
>
> Apparently, this specific class and method had triggered JVM crashes for
> other users as well. We implemented the workaround code and the trouble with
> exit code 134 was finally gone.
>
> On that webpage, someone posted in the comments a code snippet to reproduce
> the JVM crash. I have not yet confirmed whether it was reported to Sun as
> well.
>
> Cheers,
> Chris
>
>
> Todd Lipcon schrieb:
>
>> Hi Christian,
>>
>> Generally along with a nonzero exit code you should see something in the
>> stderr for that attempt. If you look on the TaskTracker inside
>> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything
>> useful?
>>
>> If it's a segfault or a linux OOM kill, you should also see something in
>> your system's kernel log. Check "dmesg" and/or /var/log/kern.log for
>> anything suspicious looking.
>>
>> Hope that helps
>> -Todd
>>
>> On Tue, Jul 21, 2009 at 2:15 AM, Christian Kirschbaum <
>> [email protected] <mailto:[email protected]>>
>> wrote:
>>
>>    Hi all,
>>
>>    we're using Hadoop 0.19.1 and have recently encountered the
>>    following erratic problem when running jobs involving UIMA text
>>    annotation chains (which fail frequently because of this):
>>
>>    java.io.IOException: Task process exit with nonzero status of 134.
>>           at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
>>
>>
>>    As you can see, this is propagated in Hadoop code, without the
>>    actual MapReduce job being able to react to it. Unfortunately,
>>    this exception message isn't very descriptive as to the actual
>>    cause which I have yet to track down.
>>
>>    All I found out is that this status code apparently is an exit
>>    code of a separate process initiated through
>>    org.apache.hadoop.util.Shell.ShellCommandExecutor in the
>>    runChild(JvmEnv) method of org.apache.hadoop.mapred.JvmManager.
>>    And because it is exit code 134 (128 + 6), supposedly signal 6
>>    (ABORT) has effected the process termination which may indicate a
>>    core dump?
>>
>>    How do I find out more about the actual cause? Is there any secret
>>    logfile for the separately spawned Jvm process? I've looked
>>    through various logs and userlogs directories but could not find
>>    any mention of this exception there.
>>
>>    Any help is appreciated.
>>
>>    Thanks,
>>    Chris
>>
>>
>>
>>
>
> --
> Christian Kirschbaum
> Software Developer
> --------------------------------------------------------
> vionto GmbH
> Karl-Marx-Allee 90a, D-10243 Berlin
>
> fon   +49 30 40 20 329 - 27
> fax   +49 30 40 20 329 - 01
> web   http://www.vionto.com
> --------------------------------------------------------
> Geschäftsführer: Ralf von Grafenstein, Dr. Martin Hirsch
> Sitz der Gesellschaft: Berlin
> Amtsgericht Berlin Charlottenburg, HRB 108154B
> --------------------------------------------------------
>
>

Re: Task process exit with nonzero status of 134

Reply via email to