[
https://issues.apache.org/jira/browse/HBASE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806956#comment-13806956
]
stack commented on HBASE-9843:
------------------------------
I can't tell from reading the patch how it changes behaviors.
- * 1, 2, 3, 10, 100, 100, 100, 100, 100, 100.
+ * 1, 2, 3, 5, 10, 20, 40, 100, 100, 100.
+ * With 100ms, a back-off of 200 means 20s
*/
- public static int RETRY_BACKOFF[] = { 1, 2, 3, 5, 10, 100 };
+ public static int RETRY_BACKOFF[] = { 1, 2, 3, 5, 10, 20, 40, 100, 100, 100,
100, 200, 200 };
Given the above, are we going to beat up on struggling servers still piling up
the requests?
> Various fixes in client code
> ----------------------------
>
> Key: HBASE-9843
> URL: https://issues.apache.org/jira/browse/HBASE-9843
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 0.96.0
> Reporter: Nicolas Liochon
> Assignee: Nicolas Liochon
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9843-trunk.v2.patch, 9843-trunk.v3.patch
>
>
> This mainly fixes issues when we had "long" errors, for example a multi
> blocked when trying to obtain a lock that was finally failing after 60s.
> Previously we were trying only for 5 minutes. We now do all the tries. I've
> fixed stuff around this area to make it work.
> There is also more logs.
> I've changed the back off array. With the default pause of 100ms, even after
> 20 tries we still retry every 10s.
> I've also changed the max per RS to something minimal. If the cluster is not
> in a very good state it's less aggressive. It seems to be a better default.
> I've done two tests:
> - on a small; homogeneous cluster, I had the same performances
> - on a bigger, but heterogeneous cluster it was twice as fast.
--
This message was sent by Atlassian JIRA
(v6.1#6144)