[
https://issues.apache.org/jira/browse/HBASE-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404407#comment-15404407
]
Joseph commented on HBASE-16209:
--------------------------------
Oh sorry, I think a lot of these changes are in response to HBASE-16138, where
we are working under the assumption that many region opens will be failed until
the Replication Table regions are up, which could take a bit of time. The sleep
would just allow us more control over how long we would retry opening a region
and not flooding the RegionServer with requests.
In terms of the error, I think closedRegionHandler is sometimes used for
closing/reassigning regions that were not failed_open, because of that they
would not have a failed_open counter, so when we tried to call get() on the
failed_open counter inside of invokeAssignLaterOnFailure() we got an NPE that
would prevent us from handling closed regions and leading to timed out tests. I
think the default initial and max sleep period is also set to 0 ms, so I don't
think it should slow down the tests that much? I ran a few of the failed tests
on my laptop and they passed, but I am still waiting on the Unit Tests. Do you
have any comments/suggestions? Thanks!
> Provide an ExponentialBackOffPolicy sleep between failed region open requests
> -----------------------------------------------------------------------------
>
> Key: HBASE-16209
> URL: https://issues.apache.org/jira/browse/HBASE-16209
> Project: HBase
> Issue Type: Bug
> Reporter: Joseph
> Assignee: Joseph
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16209-addendum.patch,
> HBASE-16209-branch-1-addendum-v2.patch, HBASE-16209-branch-1-addendum.patch,
> HBASE-16209-branch-1.patch, HBASE-16209-v2.patch, HBASE-16209.patch
>
>
> Related to HBASE-16138. As of now we currently have no pause between retrying
> failed region open requests. And with a low maximumAttempt default, we can
> quickly use up all our regionOpen retries if the server is in a bad state. I
> added in a ExponentialBackOffPolicy so that we spread out the timing of our
> open region retries in AssignmentManager. Review board at
> https://reviews.apache.org/r/50011/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)