[
https://issues.apache.org/jira/browse/HBASE-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912104#comment-13912104
]
stack commented on HBASE-10432:
-------------------------------
I like your argument @nkeywal.
My take was that we are flipping to the other extreme here... where rather than
just retrying everything unless it is explicitly called out as not retryable,
now instead we are only retry the known retryables. We want a reactive,
fail-fast system. Too long we've been off in the murky world of retries and
timeouts that came from the mapreduce/batch domain rather than for those live
serving; you've been doing a bunch of work elsewhere to help fix this. I was
thinking this flip here would bubble up new types of failures that we could
then add to the retry set or ... we have a system that fails fast.
> Rpc retries non-recoverable error
> ---------------------------------
>
> Key: HBASE-10432
> URL: https://issues.apache.org/jira/browse/HBASE-10432
> Project: HBase
> Issue Type: Bug
> Components: IPC/RPC
> Affects Versions: 0.98.0, 0.96.2, 0.99.0
> Reporter: Nick Dimiduk
> Assignee: Nick Dimiduk
> Priority: Minor
> Attachments: HBASE-10432.00.patch, HBASE-10432.01.patch,
> HBASE-10432.02.patch, HBASE-10432.02.patch, exception.txt
>
>
> I'm recently working with hbase/trunk + hive/trunk. I had a hive command
> eventually timeout with the following exception (stacktrace truncated).
> {noformat}
> Caused by: java.io.IOException: Could not set up IO Streams
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:922)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1536)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1425)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:28857)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:302)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:157)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
> ... 43 more
> Caused by: java.lang.NoSuchMethodError:
> org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Lorg/apache/hadoop/net/SocketInputWrapper;
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:861)
> ... 52 more
> {noformat}
> The root cause looks like a dependency version missmatch (Hive compiled vs
> hadoop1, HBase vs hadoop2). However, we still retry this exception, even
> though it'll never actually complete. We should be more careful where we
> blindly catch Throwables.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)