[ 
https://issues.apache.org/jira/browse/HBASE-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911532#comment-13911532
 ] 

Nicolas Liochon commented on HBASE-10432:
-----------------------------------------

I'm not sure I'm a big fan of excluding the Error from the retries. Seems to be 
a corner case to me. I'm not sure about stuff like java.io.IOError, 
org.apache.hadoop.fs.FSError, OutOfMemoryError or Error that can be created in 
the future. They can be transient (for example, the server is going to die very 
soon, or an admin is currently chaging something on the server). We don't know 
if 3rd parties won't be quick at throwing errors instead of exception as well. 
The definition of java.lang.VirtualMachineError ("jvm has run out of resources 
necessary to continue") is in the same area: if a server says this, we should 
retry on the client: with some luck the server will die and we will use another 
one. 

Now, all this is corner cases on corners cases (between wrapped remote 
exceptions on wrapped services or wrapped executions and so on), but my feeling 
is that when it's a real error, retrying is useless but does no harm, while 
there are some cases that we should retry. I think that our contract is "retry 
except if you're absolutely sure that it's useless", and managing Errors this 
way violates this contract, and puts us in a world of questions about "what is 
the meaning of a java Error in a distributed system". I don't have the answer 
but I think we could avoid the question ;-)




> Rpc retries non-recoverable error
> ---------------------------------
>
>                 Key: HBASE-10432
>                 URL: https://issues.apache.org/jira/browse/HBASE-10432
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 0.98.0, 0.96.2, 0.99.0
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>            Priority: Minor
>         Attachments: HBASE-10432.00.patch, HBASE-10432.01.patch, 
> HBASE-10432.02.patch, HBASE-10432.02.patch, exception.txt
>
>
> I'm recently working with hbase/trunk + hive/trunk. I had a hive command 
> eventually timeout with the following exception (stacktrace truncated).
> {noformat}
> Caused by: java.io.IOException: Could not set up IO Streams
>         at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:922)
>         at 
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1536)
>         at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1425)
>         at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>         at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:28857)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:302)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:157)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
>         ... 43 more
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Lorg/apache/hadoop/net/SocketInputWrapper;
>         at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:861)
>         ... 52 more
> {noformat}
> The root cause looks like a dependency version missmatch (Hive compiled vs 
> hadoop1, HBase vs hadoop2). However, we still retry this exception, even 
> though it'll never actually complete. We should be more careful where we 
> blindly catch Throwables.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to