regionserver wedged: thread holds lock during blocking read (?) ---------------------------------------------------------------
Key: HADOOP-2682 URL: https://issues.apache.org/jira/browse/HADOOP-2682 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Reporter: Michael Bieniosek Using a patched version of 0.15.0, my regionserver freezes up, saying repeatedly: 2008-01-22 23:10:12,617 WARN org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60020, call getProtocolVersion(org. apache.hadoop.hbase.HRegionInterface, 1) from xxx.xxx.xxx.178:55116: discarded for being too old (46464) Then I get: 2008-01-22 23:11:12,644 WARN org.apache.hadoop.ipc.Server: IPC Server handler 7 on 60020, call openScanner(-ROOT-,,0, [Lorg.apache.hadoop.io.Text;@19958bf9, .META., 1201043326058, null) from xxx.xxx.45.192:44993: output error java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108) at org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89) at java.io.BufferedOutputStream.flushBuffer(Unknown Source) at java.io.BufferedOutputStream.flush(Unknown Source) at java.io.DataOutputStream.flush(Unknown Source) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:615) 2008-01-22 23:11:12,660 INFO org.apache.hadoop.ipc.Server: Process Thread Dump: Discarding call getProtocolVersion(org.apache.hadoop.hbase.HRegionInterface, 1) from xxx.xxx.45.224:34506 29 active threads Thread 4329 (IPC Client connection to xxx.xxx.44.135:10000): State: WAITING Blocked count: 1 Waited count: 4 Waiting on [EMAIL PROTECTED] Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Unknown Source) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:216) org.apache.hadoop.ipc.Client$Connection.run(Client.java:255) Thread 37 (IPC Server handler 9 on 60020): State: BLOCKED Blocked count: 1 Waited count: 202 Blocked on [EMAIL PROTECTED] Blocked by 29 (IPC Server handler 1 on 60020) Stack: org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) org.apache.hadoop.hbase.HStore$StoreFileScanner.<init>(HStore.java:2057) org.apache.hadoop.hbase.HStore$HStoreScanner.<init>(HStore.java:2203) org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) org.apache.hadoop.hbase.HRegion$HScanner.<init>(HRegion.java:1699) org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) Thread 36 (IPC Server handler 8 on 60020): State: BLOCKED Blocked count: 1 Waited count: 201 Blocked on [EMAIL PROTECTED] Blocked by 29 (IPC Server handler 1 on 60020) Stack: org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) org.apache.hadoop.hbase.HStore$StoreFileScanner.<init>(HStore.java:2057) org.apache.hadoop.hbase.HStore$HStoreScanner.<init>(HStore.java:2203) org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) org.apache.hadoop.hbase.HRegion$HScanner.<init>(HRegion.java:1699) org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) Thread 35 (IPC Server handler 7 on 60020): State: RUNNABLE Blocked count: 4 Waited count: 240 Stack: sun.management.ThreadImpl.getThreadInfo0(Native Method) sun.management.ThreadImpl.getThreadInfo(Unknown Source) sun.management.ThreadImpl.getThreadInfo(Unknown Source) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:114) org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:162) org.apache.hadoop.ipc.Server$Handler.run(Server.java:579) Thread 34 (IPC Server handler 6 on 60020): State: BLOCKED Blocked count: 1 Waited count: 203 Blocked on [EMAIL PROTECTED] Blocked by 29 (IPC Server handler 1 on 60020) Stack: org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) org.apache.hadoop.hbase.HStore$StoreFileScanner.<init>(HStore.java:2057) org.apache.hadoop.hbase.HStore$HStoreScanner.<init>(HStore.java:2203) org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) org.apache.hadoop.hbase.HRegion$HScanner.<init>(HRegion.java:1699) org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) Thread 33 (IPC Server handler 5 on 60020): State: BLOCKED Blocked count: 1 Waited count: 212 Blocked on [EMAIL PROTECTED] Blocked by 29 (IPC Server handler 1 on 60020) Stack: org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) org.apache.hadoop.hbase.HStore$StoreFileScanner.<init>(HStore.java:2057) org.apache.hadoop.hbase.HStore$HStoreScanner.<init>(HStore.java:2203) org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) org.apache.hadoop.hbase.HRegion$HScanner.<init>(HRegion.java:1699) org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) Thread 32 (IPC Server handler 4 on 60020): State: BLOCKED Blocked count: 1 Waited count: 202 Blocked on [EMAIL PROTECTED] Blocked by 29 (IPC Server handler 1 on 60020) Stack: org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) org.apache.hadoop.hbase.HStore$StoreFileScanner.<init>(HStore.java:2057) org.apache.hadoop.hbase.HStore$HStoreScanner.<init>(HStore.java:2203) org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) org.apache.hadoop.hbase.HRegion$HScanner.<init>(HRegion.java:1699) org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) Thread 31 (IPC Server handler 3 on 60020): State: BLOCKED Blocked count: 1 Waited count: 210 Blocked on [EMAIL PROTECTED] Blocked by 29 (IPC Server handler 1 on 60020) Stack: org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) org.apache.hadoop.hbase.HStore$StoreFileScanner.<init>(HStore.java:2057) org.apache.hadoop.hbase.HStore$HStoreScanner.<init>(HStore.java:2203) org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) org.apache.hadoop.hbase.HRegion$HScanner.<init>(HRegion.java:1699) org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) Thread 30 (IPC Server handler 2 on 60020): State: BLOCKED Blocked count: 1 Waited count: 216 Blocked on [EMAIL PROTECTED] Blocked by 29 (IPC Server handler 1 on 60020) Stack: org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) org.apache.hadoop.hbase.HStore$StoreFileScanner.<init>(HStore.java:2057) org.apache.hadoop.hbase.HStore$HStoreScanner.<init>(HStore.java:2203) org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) org.apache.hadoop.hbase.HRegion$HScanner.<init>(HRegion.java:1699) org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) Thread 29 (IPC Server handler 1 on 60020): State: RUNNABLE Blocked count: 3 Waited count: 229 Stack: java.net.SocketInputStream.socketRead0(Native Method) java.net.SocketInputStream.read(Unknown Source) java.io.BufferedInputStream.fill(Unknown Source) java.io.BufferedInputStream.read(Unknown Source) java.io.DataInputStream.readShort(Unknown Source) org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:773) org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1000) org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1096) java.io.DataInputStream.readFully(Unknown Source) java.io.DataInputStream.readFully(Unknown Source) org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1360) org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349) org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344) org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263) org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242) org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:816) org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) org.apache.hadoop.hbase.HStore$StoreFileScanner.<init>(HStore.java:2057) org.apache.hadoop.hbase.HStore$HStoreScanner.<init>(HStore.java:2203) Thread 28 (IPC Server handler 0 on 60020): State: BLOCKED Blocked count: 1 Waited count: 200 Blocked on [EMAIL PROTECTED] Blocked by 29 (IPC Server handler 1 on 60020) Stack: org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) org.apache.hadoop.hbase.HStore$StoreFileScanner.<init>(HStore.java:2057) org.apache.hadoop.hbase.HStore$HStoreScanner.<init>(HStore.java:2203) org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) org.apache.hadoop.hbase.HRegion$HScanner.<init>(HRegion.java:1699) org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) Thread 14 (IPC Server listener on 60020): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: sun.nio.ch.PollArrayWrapper.poll0(Native Method) sun.nio.ch.PollArrayWrapper.poll(Unknown Source) sun.nio.ch.PollSelectorImpl.doSelect(Unknown Source) sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source) sun.nio.ch.SelectorImpl.select(Unknown Source) sun.nio.ch.SelectorImpl.select(Unknown Source) org.apache.hadoop.ipc.Server$Listener.run(Server.java:285) ... etc. Note that most of the threads are blocked on thread 29, which is in turn blocked on a socket read. This node was hosting the ROOT region, so when it got wedged it made hbase unusable (since the master won't reassign ROOT until the server actually goes down). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.