[
https://issues.apache.org/jira/browse/HBASE-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911532#comment-13911532
]
Nicolas Liochon commented on HBASE-10432:
-----------------------------------------
I'm not sure I'm a big fan of excluding the Error from the retries. Seems to be
a corner case to me. I'm not sure about stuff like java.io.IOError,
org.apache.hadoop.fs.FSError, OutOfMemoryError or Error that can be created in
the future. They can be transient (for example, the server is going to die very
soon, or an admin is currently chaging something on the server). We don't know
if 3rd parties won't be quick at throwing errors instead of exception as well.
The definition of java.lang.VirtualMachineError ("jvm has run out of resources
necessary to continue") is in the same area: if a server says this, we should
retry on the client: with some luck the server will die and we will use another
one.
Now, all this is corner cases on corners cases (between wrapped remote
exceptions on wrapped services or wrapped executions and so on), but my feeling
is that when it's a real error, retrying is useless but does no harm, while
there are some cases that we should retry. I think that our contract is "retry
except if you're absolutely sure that it's useless", and managing Errors this
way violates this contract, and puts us in a world of questions about "what is
the meaning of a java Error in a distributed system". I don't have the answer
but I think we could avoid the question ;-)
> Rpc retries non-recoverable error
> ---------------------------------
>
> Key: HBASE-10432
> URL: https://issues.apache.org/jira/browse/HBASE-10432
> Project: HBase
> Issue Type: Bug
> Components: IPC/RPC
> Affects Versions: 0.98.0, 0.96.2, 0.99.0
> Reporter: Nick Dimiduk
> Assignee: Nick Dimiduk
> Priority: Minor
> Attachments: HBASE-10432.00.patch, HBASE-10432.01.patch,
> HBASE-10432.02.patch, HBASE-10432.02.patch, exception.txt
>
>
> I'm recently working with hbase/trunk + hive/trunk. I had a hive command
> eventually timeout with the following exception (stacktrace truncated).
> {noformat}
> Caused by: java.io.IOException: Could not set up IO Streams
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:922)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1536)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1425)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:28857)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:302)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:157)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
> ... 43 more
> Caused by: java.lang.NoSuchMethodError:
> org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Lorg/apache/hadoop/net/SocketInputWrapper;
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:861)
> ... 52 more
> {noformat}
> The root cause looks like a dependency version missmatch (Hive compiled vs
> hadoop1, HBase vs hadoop2). However, we still retry this exception, even
> though it'll never actually complete. We should be more careful where we
> blindly catch Throwables.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)