[ 
https://issues.apache.org/jira/browse/HBASE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806956#comment-13806956
 ] 

stack commented on HBASE-9843:
------------------------------

I can't tell from reading the patch how it changes behaviors.


-   * 1, 2, 3, 10, 100, 100, 100, 100, 100, 100.
+   * 1, 2, 3, 5, 10, 20, 40, 100, 100, 100.
+   * With 100ms, a back-off of 200 means 20s
    */
-  public static int RETRY_BACKOFF[] = { 1, 2, 3, 5, 10, 100 };
+  public static int RETRY_BACKOFF[] = { 1, 2, 3, 5, 10, 20, 40, 100, 100, 100, 
100, 200, 200 };


Given the above, are we going to beat up on struggling servers still piling up 
the requests?

> Various fixes in client code
> ----------------------------
>
>                 Key: HBASE-9843
>                 URL: https://issues.apache.org/jira/browse/HBASE-9843
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.96.0
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: 9843-trunk.v2.patch, 9843-trunk.v3.patch
>
>
> This mainly fixes issues when we had "long" errors, for example a multi 
> blocked when trying to obtain a lock that was finally failing after 60s. 
> Previously we were trying only for 5 minutes. We now do all the tries. I've 
> fixed stuff around this area to make it work.
> There is also more logs.
> I've changed the back off array. With the default pause of 100ms, even after 
> 20 tries we still retry every 10s.
> I've also changed the max per RS to something minimal. If the cluster is not 
> in a very good state it's less aggressive. It seems to be a better default.
> I've done two tests:
>  - on a small; homogeneous cluster, I had the same performances
>  - on a bigger, but heterogeneous cluster it was twice as fast.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to