[ 
https://issues.apache.org/jira/browse/HBASE-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404407#comment-15404407
 ] 

Joseph commented on HBASE-16209:
--------------------------------

Oh sorry, I think a lot of these changes are in response to HBASE-16138, where 
we are working under the assumption that many region opens will be failed until 
the Replication Table regions are up, which could take a bit of time. The sleep 
would just allow us more control over how long we would retry opening a region 
and not flooding the RegionServer with requests.
In terms of the error, I think closedRegionHandler is sometimes used for 
closing/reassigning regions that were not failed_open, because of that they 
would not have a failed_open counter, so when we tried to call get() on the 
failed_open counter inside of invokeAssignLaterOnFailure() we got an NPE that 
would prevent us from handling closed regions and leading to timed out tests. I 
think the default initial and max sleep period is also set to 0 ms, so I don't 
think it should slow down the tests that much? I ran a few of the failed tests 
on my laptop and they passed, but I am still waiting on the Unit Tests. Do you 
have any comments/suggestions? Thanks! 

> Provide an ExponentialBackOffPolicy sleep between failed region open requests
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-16209
>                 URL: https://issues.apache.org/jira/browse/HBASE-16209
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Joseph
>            Assignee: Joseph
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: HBASE-16209-addendum.patch, 
> HBASE-16209-branch-1-addendum-v2.patch, HBASE-16209-branch-1-addendum.patch, 
> HBASE-16209-branch-1.patch, HBASE-16209-v2.patch, HBASE-16209.patch
>
>
> Related to HBASE-16138. As of now we currently have no pause between retrying 
> failed region open requests. And with a low maximumAttempt default, we can 
> quickly use up all our regionOpen retries if the server is in a bad state. I 
> added in a ExponentialBackOffPolicy so that we spread out the timing of our 
> open region retries in AssignmentManager. Review board at 
> https://reviews.apache.org/r/50011/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to