Rollback to u17 fixed our problem.

But thanks for information.


Ferdy-2 wrote:
> 
> Hi,
> 
> We have had a lot of these crashes in the past. Random jobs were 
> crashing with error code 134. Our environment is also linux-amd64. We 
> tried all sorts of Hadoop versions,  and JVM deployments, but it did not 
> have any positive effect.
> 
> We finally figured out it was a deep-rooted hardware problem. 
> Communication between different cores of the cpu could get corrupted 
> once and every while. This was due to a bad combination of the 
> mainboard, cpu and/or memory. In our case the problem was solved by 
> replacing all mainboards.
> 
> We could pinpoint and reproduce the problem using the following bash 
> command (run as root):
> 
> while /bin/true; do taskset -c 0 echo -ne 
> '\02...@\0306\0256yy\0210\0304\0004\0327a\0024\0343\0034\0252\0016v\r\0232\0024\0334\0233\0333\0356\0311a\0367\0375ewgkk\0253\0373\0351\007%'
>  
> | taskset -c 2 hexdump -b; done | grep 0000020 | grep -v 351
> 
> If you see any output on the console, it's means your hardware is 
> affected. If you see no output for several minutes (or perhaps one 
> hour), your machine is unlikely to be broken.
> 
> Hope this is of any help to you.
> 
> Ferdy
> 
> zward3x wrote:
>> Thanks for all help.
>>
>> Will install u17, hope that this will resolve issue.
>>
>>
>>
>> Jean-Daniel Cryans-2 wrote:
>>   
>>> As I feared, you use the unholy u18... please revert to u17.
>>>
>>> See this thread for more information:
>>> http://www.mail-archive.com/common-u...@hadoop.apache.org/msg04633.html
>>>
>>> J-D
>>>
>>> On Sun, Mar 7, 2010 at 1:32 PM, zward3x <pasalic.zahar...@gmail.com>
>>> wrote:
>>>     
>>>> $ java -version
>>>> java version "1.6.0_18"
>>>> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>>>>
>>>> there is nothing in stderr, but here is part from stdout
>>>>
>>>> #
>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>> #
>>>> #  SIGSEGV (0xb) at pc=0x00002b19ef8cc34e, pid=12633, tid=1104492864
>>>> #
>>>> # JRE version: 6.0_18-b07
>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
>>>> linux-amd64 )
>>>> # Problematic frame:
>>>> # V  [libjvm.so+0x2de34e]
>>>> #
>>>> # An error report file with more information is saved as:
>>>> #
>>>> /hadoop/mapred/local/taskTracker/jobcache/job_201003072002_0002/attempt_201003072002_0002_r_000019_0/work/hs_err_pid12633.log
>>>> #
>>>> # If you would like to submit a bug report, please visit:
>>>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>>>> #
>>>>
>>>> Also, file which is mentioned above (hs_err_pid12633.log) does not
>>>> exist.
>>>>
>>>>
>>>>
>>>> Jean-Daniel Cryans-2 wrote:
>>>>       
>>>>>> i'm using hadoop 0.20.1 and hbase 0.20.3
>>>>>>           
>>>>> Sorry I meant java version.
>>>>>
>>>>>         
>>>>>> i already try to put
>>>>>>
>>>>>> -XX:ErrorFile=/opt/hadoop/hadoop/logs/java/java_error%p.log
>>>>>>
>>>>>> in hadoop-env.sh as HADOOP_OPTS but after reduce crash i did not find
>>>>>> any
>>>>>> file on that path.
>>>>>>           
>>>>> Todd doesn't talk about that, he said:
>>>>>
>>>>>         
>>>>>> Generally along with a nonzero exit code you should see something in
>>>>>> the stderr for that attempt. If you look on the TaskTracker inside
>>>>>> logs/userlogs/attempt_<the failed attempt>/stderr do you see anything
>>>>>> useful?
>>>>>>           
>>>>>         
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27814802.html
>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>
>>>>
>>>>       
>>>     
>>
>>   
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Task-process-exit-with-nonzero-status-of-134...-tp27814144p27827879.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to