I'd run memcheck overnight on the nodes that caused the problem, just
to be sure.
Another (unlikely) possibility is that the JNI callouts for the native
libraries Hadoop use (for the Compression codecs, I believe) have
crashed or were set up wrong, and died fatally enough to take out the
JVM. Are you using any compression? Does your job complete
successfully in "local" mode, if the crash correlates well with a job
running?
Brian
On Dec 1, 2008, at 3:32 PM, Sagar Naik wrote:
Brian Bockelman wrote:
Hardware/memory problems?
I m not sure.
SIGBUS is relatively rare; it sometimes indicates a hardware error
in the memory system, depending on your arch.
*uname -a : *
Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
2006 i686 i686 i386 GNU/Linux
*top's top*
Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 98.0% id, 0.8% wa, 0.0% hi,
0.0% si
Mem: 8288280k total, 1575680k used, 6712600k free, 5392k
buffers
Swap: 16386292k total, 68k used, 16386224k free, 522408k
cached
8 core , xeon 2GHz
Brian
On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:
Couple of the datanodes crashed with the following error
The /tmp is 15% occupied
#
# An unexpected error has been detected by Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
#
[Too many errors, abort]
Pl suggest how should I go to debug this particular problem
-Sagar
Thanks to Brian
-Sagar