I'd run memcheck overnight on the nodes that caused the problem, just to be sure.

Another (unlikely) possibility is that the JNI callouts for the native libraries Hadoop use (for the Compression codecs, I believe) have crashed or were set up wrong, and died fatally enough to take out the JVM. Are you using any compression? Does your job complete successfully in "local" mode, if the crash correlates well with a job running?

Brian

On Dec 1, 2008, at 3:32 PM, Sagar Naik wrote:



Brian Bockelman wrote:
Hardware/memory problems?
I m not sure.

SIGBUS is relatively rare; it sometimes indicates a hardware error in the memory system, depending on your arch.

*uname -a : *
Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006 i686 i686 i386 GNU/Linux
*top's top*
Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 98.0% id, 0.8% wa, 0.0% hi, 0.0% si Mem: 8288280k total, 1575680k used, 6712600k free, 5392k buffers Swap: 16386292k total, 68k used, 16386224k free, 522408k cached

8 core , xeon  2GHz

Brian

On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:

Couple of the datanodes crashed with the following error
The /tmp is 15% occupied

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
#
[Too many errors, abort]

Pl suggest how should I go to debug this particular problem


-Sagar


Thanks to Brian

-Sagar

Reply via email to