[
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicolas Liochon updated HBASE-10566:
------------------------------------
Attachment: 10566.sample.patch
> cleanup rpcTimeout in the client
> --------------------------------
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 0.99.0
> Reporter: Nicolas Liochon
> Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be
> lowered to single digits timeouts for some apps: if we can not write to the
> socket in 10 second, we have an issue. This is different from the total
> duration (send query + do query + receive query), that can be longer, as it
> can include remotes calls on the server and so on. Today, we have a single
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total
> time, retries included is 60 seconds but failed after 2 seconds, then the
> remaining is 58s. HBase does this today, but by hacking with a thread local
> storage variable. It's a hack (it should have been a parameter of the
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this
> complicated, to be confirmed), but as well it does not really work, because
> we can have multithreading issues (we use the updated rpc timeout of someone
> else, or we create a new BlockingRpcChannelImplementation with a random
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be
> able to dismiss alone the calls that it received but git stick in the request
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o
> a single thread per region server as today.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)