[
https://issues.apache.org/jira/browse/HBASE-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13745145#comment-13745145
]
Nicolas Liochon commented on HBASE-9268:
----------------------------------------
I've played with a pseudo distributed cluster + ycsb and got this when I kill
-STOP the regionserver:
{noformat}
java.lang.Thread.State: BLOCKED (on object monitor)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
- waiting to lock <0x00000007de6af410****** > (a
java.io.BufferedOutputStream)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:232)
at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:248)
at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.close(RpcClient.java:963)
- locked <0x00000007de6ab808> (a
org.apache.hadoop.hbase.ipc.RpcClient$Connection)
at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:718)
"hbase-table-pool-1-thread-6" daemon prio=10 tid=0x00007f93000ce800 nid=0x649c
runnable [0x00007f932aa85000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x00000007deb3ff60> (a sun.nio.ch.Util$2)
- locked <0x00000007deb3ff50> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000007deb3fd48> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
- locked <0x00000007de6af410******> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
- locked <0x00000007de6af3f0> (a java.io.DataOutputStream)
at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:230)
at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:220)
at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1039)
- locked <0x00000007de6af3f0> (a java.io.DataOutputStream)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1407)
at
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1635)
{noformat}
It's exactly like if the timeout on the socket was not set. Strange.
> Client doesn't recover from a stalled region server
> ---------------------------------------------------
>
> Key: HBASE-9268
> URL: https://issues.apache.org/jira/browse/HBASE-9268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.95.2
> Reporter: Jean-Daniel Cryans
> Fix For: 0.98.0, 0.95.3
>
>
> Got this testing the 0.95.2 RC.
> I killed -STOP a region server and let it stay like that while running PE.
> The clients didn't find the new region locations and in the jstack were stuck
> doing RPC. Eventually I killed -CONT and the client printed these:
> bq. Exception in thread "TestClient-6" java.lang.RuntimeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 128 actions: IOException: 90 times, SocketTimeoutException: 38 times,
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira