[
https://issues.apache.org/jira/browse/HBASE-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin resolved HBASE-9787.
-------------------------------------
Resolution: Invalid
Fix Version/s: (was: 0.96.1)
I see this is already done
> HCM should not stop retrying after retry timeout if the retry count is not
> exhausted
> ------------------------------------------------------------------------------------
>
> Key: HBASE-9787
> URL: https://issues.apache.org/jira/browse/HBASE-9787
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Priority: Minor
>
> See HBASE-9775:
> Some comment on the retry time limit, we may need to fix it.
> It was introduced for server-specific retry fallback, which I hope is not
> broken by recent changes to HCM. That is the logic where we go to one server,
> retry, wait, retry, wait more, retry, wait more, then we learn that region
> went to different server. Here, we don't need to wait, because we can assume
> by default the different server is healthy; but the old code would carry on
> with wait sequence.
> However, if region moves around (which is common in aggressive CM IT tests),
> retry count can quickly be exhausted as we go to each new server a few times
> and never reach higher multipliers. It was especially pronounced w/10
> retries, where some request could fail in just a few seconds in case of
> double server failure where region is recovered twice; w/31-35 now it's
> probably less pronounced but still possible.
> So, the time limit based on original retries is supposed to prevent these
> fast failures, by allowing the retries to go on for as long as we would have
> retried "as if" we were just using the multiplier sequence to its "full
> potential".
> It should not serve as lower limit, we might want to change code to check
> that both time AND count are exhaused, in this case.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)