[
https://issues.apache.org/jira/browse/HBASE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808352#comment-13808352
]
Nicolas Liochon commented on HBASE-9843:
----------------------------------------
bq. Given the above, are we going to beat up on struggling servers still piling
up the requests?
The patch doesn't really change this, even if I think we should.
I've just changed the final back-off time, and increased them. After 15
retries, we were sending the query every 10s. Now it will be every 20s.
I tend to think that 100ms is too short for most cases. It makes sense if the
region has moved, in all other cases waiting 1s seems better. But it's an other
discussion, I'm not totally sure of what the right setting should be.
> Various fixes in client code
> ----------------------------
>
> Key: HBASE-9843
> URL: https://issues.apache.org/jira/browse/HBASE-9843
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 0.96.0
> Reporter: Nicolas Liochon
> Assignee: Nicolas Liochon
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9843-trunk.v2.patch, 9843-trunk.v3.patch
>
>
> This mainly fixes issues when we had "long" errors, for example a multi
> blocked when trying to obtain a lock that was finally failing after 60s.
> Previously we were trying only for 5 minutes. We now do all the tries. I've
> fixed stuff around this area to make it work.
> There is also more logs.
> I've changed the back off array. With the default pause of 100ms, even after
> 20 tries we still retry every 10s.
> I've also changed the max per RS to something minimal. If the cluster is not
> in a very good state it's less aggressive. It seems to be a better default.
> I've done two tests:
> - on a small; homogeneous cluster, I had the same performances
> - on a bigger, but heterogeneous cluster it was twice as fast.
--
This message was sent by Atlassian JIRA
(v6.1#6144)