Re: Task process exit with nonzero status of 134...

Ferdy Mon, 08 Mar 2010 03:25:09 -0800

Hi,

We have had a lot of these crashes in the past. Random jobs werecrashing with error code 134. Our environment is also linux-amd64. Wetried all sorts of Hadoop versions, and JVM deployments, but it did nothave any positive effect.

We finally figured out it was a deep-rooted hardware problem.Communication between different cores of the cpu could get corruptedonce and every while. This was due to a bad combination of themainboard, cpu and/or memory. In our case the problem was solved byreplacing all mainboards.

We could pinpoint and reproduce the problem using the following bashcommand (run as root):

while /bin/true; do taskset -c 0 echo -ne'\02...@\0306\0256yy\0210\0304\0004\0327a\0024\0343\0034\0252\0016v\r\0232\0024\0334\0233\0333\0356\0311a\0367\0375ewgkk\0253\0373\0351\007%'| taskset -c 2 hexdump -b; done | grep 0000020 | grep -v 351

If you see any output on the console, it's means your hardware isaffected. If you see no output for several minutes (or perhaps onehour), your machine is unlikely to be broken.


Hope this is of any help to you.

Ferdy

zward3x wrote:

Thanks for all help.

Will install u17, hope that this will resolve issue.



Jean-Daniel Cryans-2 wrote:

As I feared, you use the unholy u18... please revert to u17.

See this thread for more information:
http://www.mail-archive.com/common-u...@hadoop.apache.org/msg04633.html

J-D

On Sun, Mar 7, 2010 at 1:32 PM, zward3x <pasalic.zahar...@gmail.com>
wrote:

$ java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

there is nothing in stderr, but here is part from stdout

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00002b19ef8cc34e, pid=12633, tid=1104492864
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
linux-amd64 )
# Problematic frame:
# V  [libjvm.so+0x2de34e]
#
# An error report file with more information is saved as:
#
/hadoop/mapred/local/taskTracker/jobcache/job_201003072002_0002/attempt_201003072002_0002_r_000019_0/work/hs_err_pid12633.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

Also, file which is mentioned above (hs_err_pid12633.log) does not exist.



Jean-Daniel Cryans-2 wrote:

i'm using hadoop 0.20.1 and hbase 0.20.3

Sorry I meant java version.

i already try to put

-XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log

in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find
any
file on that path.

Todd doesn't talk about that, he said:

Generally along with a nonzero exit code you should see something in
the stderr for that attempt. If you look on the TaskTracker inside
logs/userlogs/attempt_<the failed attempt>/stderr do you see anything
useful?

--
View this message in context:
http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814802.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Task process exit with nonzero status of 134...

Reply via email to