[
https://issues.apache.org/jira/browse/HBASE-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596393#comment-13596393
]
Jean-Daniel Cryans commented on HBASE-7989:
-------------------------------------------
This is something we saw yesterday I think.
First we saw tons of those a minute after the server died:
{noformat}
2013-03-07 01:27:57,065 WARN
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Failed all from region=someregion, hostname=sv4r20s13, port=10304
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Call
to sv4r20s13/10.4.20.13:10304 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/10.4.17.37:46591 remote=sv4r20s13/10.4.20.13:10304]
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1525)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:702)
at
org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.parallelGet(ThriftServerRunner.java:1410)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:65)
at $Proxy5.parallelGet(Unknown Source)
at
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$parallelGet.getResult(Hbase.java:4930)
at
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$parallelGet.getResult(Hbase.java:4918)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:287)
at org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:62)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: Call to sv4r20s13/10.4.20.13:10304
failed on socket timeout exception: java.net.SocketTimeoutException: 60000
millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.4.17.37:46591
remote=sv4r20s13/10.4.20.13:10304]
at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1052)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy6.multi(Unknown Source)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1354)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1352)
at
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1361)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1349)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
... 3 more
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting
for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.4.17.37:46591
remote=sv4r20s13/10.4.20.13:10304]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:399)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:672)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:606)
{noformat}
Followed regular appearances by that 20 seconds timeout:
{noformat}
2013-03-07 01:32:17,205 WARN
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Failed all from region=someregion, hostname=sv4r20s13, port=10304
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: 20000
millis timeout while waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending
remote=sv4r20s13/10.4.20.13:10304]
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1525)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1377)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:702)
at
org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.parallelGet(ThriftServerRunner.java:1410)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:65)
at $Proxy5.parallelGet(Unknown Source)
at
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$parallelGet.getResult(Hbase.java:4930)
at
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$parallelGet.getResult(Hbase.java:4918)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:287)
at org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:62)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: 20000 millis timeout while waiting
for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending
remote=sv4r20s13/10.4.20.13:10304]
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:519)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:484)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:416)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:462)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1150)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1000)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy6.multi(Unknown Source)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1354)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1352)
at
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1361)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1349)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
{noformat}
> Client with a cache info on a dead server will wait for 20s before trying
> another one.
> --------------------------------------------------------------------------------------
>
> Key: HBASE-7989
> URL: https://issues.apache.org/jira/browse/HBASE-7989
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 0.95.0, 0.98.0
> Reporter: nkeywal
>
> Scenario is:
> - fetch the cache in the client
> - a server dies
> - try to use a region that is on the dead server
> This will lead to a 20 second connect timeout. We don't have this in unit
> test because we have this only is the remote box does not answer. In the unit
> tests we have immediately a connection refused from the OS.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira