Hi all,
I'm using hbase 0.1.3 and I have a pretty simple TableMap that is randomly
hanging at OutputCollector.collect. Eventually, the task gets killed because
it doesn't report back. There are no error messages in the log. CPU is 100%
for the task. I've included a thread dump below. Any ideas?
Full thread dump Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode):
"SortSpillThread" daemon prio=10 tid=0x00002aaad40d4c00 nid=0x6729 runnable
[0x000000004173f000..0x000000004173fd80]
java.lang.Thread.State: RUNNABLE
at
com.rexee.bandito.hadoop.logprocessing.CountFaillures$Reduce.reduce(CountFaillures.java:106)
at
com.rexee.bandito.hadoop.logprocessing.CountFaillures$Reduce.reduce(CountFaillures.java:102)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:522)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:493)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$200(MapTask.java:264)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$1.run(MapTask.java:439)
- locked <0x00002aaab3a874e8> (a java.lang.Object)
"org.apache.hadoop.hbase.io.HbaseObjectWritable Connection Culler" daemon
prio=10 tid=0x00002aaad40bf400 nid=0x64a0 waiting on condition
[0x000000004143c000..0x000000004143ca80]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:423)
"Comm thread for task_200807162013_0006_m_000001_0" daemon prio=10
tid=0x00002aaad4107400 nid=0x649f waiting on condition
[0x000000004133b000..0x000000004133ba00]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.Task$1.run(Task.java:282)
at java.lang.Thread.run(Thread.java:619)
"[EMAIL PROTECTED]" daemon prio=10
tid=0x00002aaad4107000 nid=0x649e waiting on condition
[0x000000004123a000..0x000000004123ad80]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:605)
at java.lang.Thread.run(Thread.java:619)
"IPC Client connection to domU-12-31-38-00-D4-21/10.252.219.207:9000" daemon
prio=10 tid=0x00002aaad4129800 nid=0x649d in Object.wait()
[0x0000000041139000..0x0000000041139d00]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaab3acad18> (a
org.apache.hadoop.ipc.Client$Connection)
at java.lang.Object.wait(Object.java:485)
at
org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:234)
- locked <0x00002aaab3acad18> (a
org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:273)
"IPC Client connection to /127.0.0.1:55679" daemon prio=10
tid=0x00002aaad40aa000 nid=0x6499 in Object.wait()
[0x0000000041038000..0x0000000041038c80]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaab3aa0780> (a
org.apache.hadoop.ipc.Client$Connection)
at java.lang.Object.wait(Object.java:485)
at
org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:234)
- locked <0x00002aaab3aa0780> (a
org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:273)
"org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=10
tid=0x00002aaad40f2c00 nid=0x6498 waiting on condition
[0x0000000040f37000..0x0000000040f37c00]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:423)
"Low Memory Detector" daemon prio=10 tid=0x00002aaad330ec00 nid=0x6494
runnable [0x0000000000000000..0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread1" daemon prio=10 tid=0x00002aaad330c400 nid=0x6493 waiting
on condition [0x0000000000000000..0x0000000040c33320]
java.lang.Thread.State: RUNNABLE
"CompilerThread0" daemon prio=10 tid=0x00002aaad3308c00 nid=0x6492 waiting
on condition [0x0000000000000000..0x0000000040b322b0]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00002aaad3307800 nid=0x6491
runnable [0x0000000000000000..0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00002aaad32dd000 nid=0x6490 in
Object.wait() [0x0000000040931000..0x0000000040931d00]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaab3ad4ad8> (a
java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
- locked <0x00002aaab3ad4ad8> (a
java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x00002aaad312cc00 nid=0x648f in
Object.wait() [0x0000000040830000..0x0000000040830c80]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaab3af78c8> (a
java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x00002aaab3af78c8> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x0000000040113400 nid=0x6489 waiting for monitor entry
[0x000000004022a000..0x000000004022aec0]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:407)
- waiting to lock <0x00002aaab3a874e8> (a java.lang.Object)
- locked <0x00002aaab3af7928> (a
org.apache.hadoop.mapred.MapTask$MapOutputBuffer)
at
com.rexee.bandito.hadoop.logprocessing.CountFaillures$Map.map(CountFaillures.java:84)
at
com.rexee.bandito.hadoop.logprocessing.CountFaillures$Map.map(CountFaillures.java:58)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
"VM Thread" prio=10 tid=0x00002aaad312a800 nid=0x648e runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000004011e000 nid=0x648a
runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000004011f400 nid=0x648b
runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000040120800 nid=0x648c
runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000040121800 nid=0x648d
runnable
"VM Periodic Task Thread" prio=10 tid=0x00002aaad3310800 nid=0x6495 waiting
on condition
JNI global references: 762
Heap
PSYoungGen total 170560K, used 68698K [0x00002aaac8770000,
0x00002aaad2e10000, 0x00002aaad2e10000)
eden space 170496K, 40% used
[0x00002aaac8770000,0x00002aaacca7e9b8,0x00002aaad2df0000)
from space 64K, 50% used
[0x00002aaad2df0000,0x00002aaad2df8000,0x00002aaad2e00000)
to space 64K, 0% used
[0x00002aaad2e00000,0x00002aaad2e00000,0x00002aaad2e10000)
PSOldGen total 236416K, used 195186K [0x00002aaab3a10000,
0x00002aaac20f0000, 0x00002aaac8770000)
object space 236416K, 82% used
[0x00002aaab3a10000,0x00002aaabf8ac9a0,0x00002aaac20f0000)
PSPermGen total 21248K, used 9686K [0x00002aaaae610000,
0x00002aaaafad0000, 0x00002aaab3a10000)
object space 21248K, 45% used
[0x00002aaaae610000,0x00002aaaaef85b08,0x00002aaaafad0000)