Rollback to u17 fixed our problem. But thanks for information.
Ferdy-2 wrote: > > Hi, > > We have had a lot of these crashes in the past. Random jobs were > crashing with error code 134. Our environment is also linux-amd64. We > tried all sorts of Hadoop versions, and JVM deployments, but it did not > have any positive effect. > > We finally figured out it was a deep-rooted hardware problem. > Communication between different cores of the cpu could get corrupted > once and every while. This was due to a bad combination of the > mainboard, cpu and/or memory. In our case the problem was solved by > replacing all mainboards. > > We could pinpoint and reproduce the problem using the following bash > command (run as root): > > while /bin/true; do taskset -c 0 echo -ne > '\02...@\0306\0256yy\0210\0304\0004\0327a\0024\0343\0034\0252\0016v\r\0232\0024\0334\0233\0333\0356\0311a\0367\0375ewgkk\0253\0373\0351\007%' > > | taskset -c 2 hexdump -b; done | grep 0000020 | grep -v 351 > > If you see any output on the console, it's means your hardware is > affected. If you see no output for several minutes (or perhaps one > hour), your machine is unlikely to be broken. > > Hope this is of any help to you. > > Ferdy > > zward3x wrote: >> Thanks for all help. >> >> Will install u17, hope that this will resolve issue. >> >> >> >> Jean-Daniel Cryans-2 wrote: >> >>> As I feared, you use the unholy u18... please revert to u17. >>> >>> See this thread for more information: >>> http://www.mail-archive.com/common-u...@hadoop.apache.org/msg04633.html >>> >>> J-D >>> >>> On Sun, Mar 7, 2010 at 1:32 PM, zward3x <pasalic.zahar...@gmail.com> >>> wrote: >>> >>>> $ java -version >>>> java version "1.6.0_18" >>>> Java(TM) SE Runtime Environment (build 1.6.0_18-b07) >>>> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode) >>>> >>>> there is nothing in stderr, but here is part from stdout >>>> >>>> # >>>> # A fatal error has been detected by the Java Runtime Environment: >>>> # >>>> # SIGSEGV (0xb) at pc=0x00002b19ef8cc34e, pid=12633, tid=1104492864 >>>> # >>>> # JRE version: 6.0_18-b07 >>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode >>>> linux-amd64 ) >>>> # Problematic frame: >>>> # V [libjvm.so+0x2de34e] >>>> # >>>> # An error report file with more information is saved as: >>>> # >>>> /hadoop/mapred/local/taskTracker/jobcache/job_201003072002_0002/attempt_201003072002_0002_r_000019_0/work/hs_err_pid12633.log >>>> # >>>> # If you would like to submit a bug report, please visit: >>>> # http://java.sun.com/webapps/bugreport/crash.jsp >>>> # >>>> >>>> Also, file which is mentioned above (hs_err_pid12633.log) does not >>>> exist. >>>> >>>> >>>> >>>> Jean-Daniel Cryans-2 wrote: >>>> >>>>>> i'm using hadoop 0.20.1 and hbase 0.20.3 >>>>>> >>>>> Sorry I meant java version. >>>>> >>>>> >>>>>> i already try to put >>>>>> >>>>>> -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log >>>>>> >>>>>> in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find >>>>>> any >>>>>> file on that path. >>>>>> >>>>> Todd doesn't talk about that, he said: >>>>> >>>>> >>>>>> Generally along with a nonzero exit code you should see something in >>>>>> the stderr for that attempt. If you look on the TaskTracker inside >>>>>> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything >>>>>> useful? >>>>>> >>>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814802.html >>>> Sent from the HBase User mailing list archive at Nabble.com. >>>> >>>> >>>> >>> >> >> > > -- View this message in context: http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27827879.html Sent from the HBase User mailing list archive at Nabble.com.