[ 
https://issues.apache.org/jira/browse/HBASE-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HBASE-9787.
-------------------------------------

       Resolution: Invalid
    Fix Version/s:     (was: 0.96.1)

I see this is already done

> HCM should not stop retrying after retry timeout if the retry count is not 
> exhausted
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-9787
>                 URL: https://issues.apache.org/jira/browse/HBASE-9787
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Priority: Minor
>
> See HBASE-9775:
> Some comment on the retry time limit, we may need to fix it.
> It was introduced for server-specific retry fallback, which I hope is not 
> broken by recent changes to HCM. That is the logic where we go to one server, 
> retry, wait, retry, wait more, retry, wait more, then we learn that region 
> went to different server. Here, we don't need to wait, because we can assume 
> by default the different server is healthy; but the old code would carry on 
> with wait sequence.
> However, if region moves around (which is common in aggressive CM IT tests), 
> retry count can quickly be exhausted as we go to each new server a few times 
> and never reach higher multipliers. It was especially pronounced w/10 
> retries, where some request could fail in just a few seconds in case of 
> double server failure where region is recovered twice; w/31-35 now it's 
> probably less pronounced but still possible.
> So, the time limit based on original retries is supposed to prevent these 
> fast failures, by allowing the retries to go on for as long as we would have 
> retried "as if" we were just using the multiplier sequence to its "full 
> potential".
> It should not serve as lower limit, we might want to change code to check 
> that both time AND count are exhaused, in this case.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to