I have done two exercises of TeraDataGen and TeraDataSort: (1) 1GB data -> table t1 (2) 2GB data -> table t2
Then I write a mapred job to do the RowCounter, each mapper count on region and then do combiner and then do reducer. The RowCounter job for the table t1(1GB) works fine and finished in 3 minutes. But the RowCounter job for the table t2(10GB) cannot complete. I checked each map task's status, and found it is dead-locked in the Spill step, each map only spilled 2 but there is 3 spills for each map task. I think the map task (child) is dead-locked when spilling the map output (SpillThread) and openScanner.... Schubert On Sun, Mar 1, 2009 at 6:50 AM, stack <[email protected]> wrote: > Client is trying to open scanner on 10.24.1.14 (or .12). Can you look in > regionserver logs on that machine and see if you can see whats holding it > up? It never moves on from here? > St.Ack > > On Sat, Feb 28, 2009 at 3:07 AM, schubert zhang <[email protected]> wrote: > > > And another problem. > > > > We I ran RowCounter job to count the rows of sort10g table, the job's map > > child process is locked and cannot complete. > > > > [schub...@nd1-rack0-cloud bin]$ jps > > 14069 Child > > 13124 Child > > 7081 HRegionServer > > 14190 Child > > 6841 DataNode > > 14158 Child > > 12827 TaskTracker > > 14266 Child > > 14333 Jps > > [schub...@nd1-rack0-cloud bin]$ jstack -l 14266 > > 2009-02-28 18:01:09 > > Full thread dump Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode): > > > > "Attach Listener" daemon prio=10 tid=0x0000000049801c00 nid=0x382e > waiting > > on condition [0x0000000000000000..0x0000000000000000] > > java.lang.Thread.State: RUNNABLE > > > > Locked ownable synchronizers: > > - None > > > > "IPC Client (47) connection to /10.24.1.14:60020 from an unknown user" > > daemon prio=10 tid=0x00002aaaf844f800 nid=0x381a runnable > > [0x000000004151c000..0x000000004151cb80] > > java.lang.Thread.State: RUNNABLE > > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) > > at > sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) > > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) > > - locked <0x00002aaabe0f6110> (a sun.nio.ch.Util$1) > > - locked <0x00002aaabe0f60f8> (a > > java.util.Collections$UnmodifiableSet) > > - locked <0x00002aaabe0f5d68> (a sun.nio.ch.EPollSelectorImpl) > > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) > > at > > > > > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260) > > at > > > > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) > > at > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) > > at > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) > > at java.io.FilterInputStream.read(FilterInputStream.java:116) > > at > > > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:276) > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > > - locked <0x00002aaadf9168f8> (a java.io.BufferedInputStream) > > at java.io.DataInputStream.readInt(DataInputStream.java:370) > > at > > > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:498) > > at > > > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:443) > > > > Locked ownable synchronizers: > > - None > > > > "IPC Client (47) connection to /10.24.1.12:60020 from an unknown user" > > daemon prio=10 tid=0x00002aaaf82bf000 nid=0x37d0 in Object.wait() > > [0x000000004161d000..0x000000004161dd00] > > java.lang.Thread.State: TIMED_WAITING (on object monitor) > > at java.lang.Object.wait(Native Method) > > at > > > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.waitForWork(HBaseClient.java:400) > > - locked <0x00002aaabe13ea18> (a > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection) > > at > > > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:442) > > > > Locked ownable synchronizers: > > - None > > > > "SpillThread" daemon prio=10 tid=0x00002aaaf827fc00 nid=0x37cd waiting on > > condition [0x000000004131a000..0x000000004131ac80] > > java.lang.Thread.State: WAITING (parking) > > at sun.misc.Unsafe.park(Native Method) > > - parking to wait for <0x00002aaabe0ebc80> (a > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > > at > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) > > at > > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:882) > > > > Locked ownable synchronizers: > > - None > > > > "Comm thread for attempt_200902271728_0012_m_000017_1" daemon prio=10 > > tid=0x00002aaaf82d7c00 nid=0x37cc waiting on condition > > [0x0000000041219000..0x0000000041219b00] > > java.lang.Thread.State: TIMED_WAITING (sleeping) > > at java.lang.Thread.sleep(Native Method) > > at org.apache.hadoop.mapred.Task$1.run(Task.java:403) > > at java.lang.Thread.run(Thread.java:619) > > > > Locked ownable synchronizers: > > - None > > > > "Thread for syncLogs" daemon prio=10 tid=0x00002aaaf82e9800 nid=0x37ca > > waiting on condition [0x0000000041017000..0x0000000041017a00] > > java.lang.Thread.State: TIMED_WAITING (sleeping) > > at java.lang.Thread.sleep(Native Method) > > at org.apache.hadoop.mapred.Child$1.run(Child.java:77) > > > > Locked ownable synchronizers: > > - None > > > > "IPC Client (47) connection to /127.0.0.1:33444 from an unknown user" > > daemon > > prio=10 tid=0x00002aaaf81efc00 nid=0x37c9 in Object.wait() > > [0x0000000040f16000..0x0000000040f16a80] > > java.lang.Thread.State: TIMED_WAITING (on object monitor) > > at java.lang.Object.wait(Native Method) > > at > > org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:396) > > - locked <0x00002aaabe0ebf48> (a > > org.apache.hadoop.ipc.Client$Connection) > > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:438) > > > > Locked ownable synchronizers: > > - None > > > > "Low Memory Detector" daemon prio=10 tid=0x000000004979cc00 nid=0x37c7 > > runnable [0x0000000000000000..0x0000000000000000] > > java.lang.Thread.State: RUNNABLE > > > > Locked ownable synchronizers: > > - None > > > > "CompilerThread1" daemon prio=10 tid=0x000000004979a800 nid=0x37c6 > waiting > > on condition [0x0000000000000000..0x0000000040c12450] > > java.lang.Thread.State: RUNNABLE > > > > Locked ownable synchronizers: > > - None > > > > "CompilerThread0" daemon prio=10 tid=0x0000000049797000 nid=0x37c5 > waiting > > on condition [0x0000000000000000..0x0000000040b11520] > > java.lang.Thread.State: RUNNABLE > > > > Locked ownable synchronizers: > > - None > > > > "Signal Dispatcher" daemon prio=10 tid=0x0000000049795800 nid=0x37c4 > > runnable [0x0000000000000000..0x0000000040a11790] > > java.lang.Thread.State: RUNNABLE > > > > Locked ownable synchronizers: > > - None > > > > "Finalizer" daemon prio=10 tid=0x000000004976ac00 nid=0x37c3 in > > Object.wait() [0x0000000040910000..0x0000000040910b80] > > java.lang.Thread.State: WAITING (on object monitor) > > at java.lang.Object.wait(Native Method) > > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) > > - locked <0x00002aaabe0db6c8> (a > java.lang.ref.ReferenceQueue$Lock) > > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) > > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) > > > > Locked ownable synchronizers: > > - None > > > > "Reference Handler" daemon prio=10 tid=0x0000000049769400 nid=0x37c2 in > > Object.wait() [0x000000004080f000..0x000000004080fa00] > > java.lang.Thread.State: WAITING (on object monitor) > > at java.lang.Object.wait(Native Method) > > at java.lang.Object.wait(Object.java:485) > > at > java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) > > - locked <0x00002aaabe0ec428> (a java.lang.ref.Reference$Lock) > > > > Locked ownable synchronizers: > > - None > > > > "main" prio=10 tid=0x00000000496e2000 nid=0x37bc in Object.wait() > > [0x0000000040209000..0x0000000040209ec0] > > java.lang.Thread.State: WAITING (on object monitor) > > at java.lang.Object.wait(Native Method) > > at java.lang.Object.wait(Object.java:485) > > at > > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:695) > > - locked <0x00002aaadf91b250> (a > > org.apache.hadoop.hbase.ipc.HBaseClient$Call) > > at > > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321) > > at $Proxy3.openScanner(Unknown Source) > > at > > > > > org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:86) > > at > > > > > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:77) > > at > > > > > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:34) > > at > > > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:828) > > at > > > > > org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1582) > > at > > > org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1645) > > at > > net.sandmill.examples.mapred.hbase.TableRowRecordReader.next(Unknown > > Source) > > at > > net.sandmill.examples.mapred.hbase.TableRowRecordReader.next(Unknown > > Source) > > at > > > > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192) > > - locked <0x00002aaabe169bf0> (a > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader) > > at > > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176) > > - locked <0x00002aaabe169bf0> (a > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > > at org.apache.hadoop.mapred.Child.main(Child.java:158) > > > > Locked ownable synchronizers: > > - None > > > > "VM Thread" prio=10 tid=0x0000000049764000 nid=0x37c1 runnable > > > > "GC task thread#0 (ParallelGC)" prio=10 tid=0x00000000496ec000 nid=0x37bd > > runnable > > > > "GC task thread#1 (ParallelGC)" prio=10 tid=0x00000000496ed400 nid=0x37be > > runnable > > > > "GC task thread#2 (ParallelGC)" prio=10 tid=0x00000000496ee800 nid=0x37bf > > runnable > > > > "GC task thread#3 (ParallelGC)" prio=10 tid=0x00000000496efc00 nid=0x37c0 > > runnable > > > > "VM Periodic Task Thread" prio=10 tid=0x000000004979e800 nid=0x37c8 > waiting > > on condition > > > > JNI global references: 847 > > >
