[jira] [Commented] (HBASE-8150) the code that handles RAITE on master in 0.94 should not always use the same plan

rajeshbabu (JIRA) Wed, 20 Mar 2013 11:11:25 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607939#comment-13607939
 ]


rajeshbabu commented on HBASE-8150:
-----------------------------------

Sorry [~sershe] didnt see your comment. 
bq. It can actually happen during transition to failed_open, the thread only 
sees region in rits, it doesn't know what open handler is doing. Wouldn't 
region be stuck in this case if master assumes server will finish the 
assignment?
transition to failed open will fail with version mismatch and second open 
starts transitioning. It wont be stuck.
                
> the code that handles RAITE on master in 0.94 should not always use the same 
> plan
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-8150
>                 URL: https://issues.apache.org/jira/browse/HBASE-8150
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Minor
>
> The code in 0.94 AM sets the region plan to point to the same server when 
> retrying the assignment due to RAITE.
> {code}
> LOG.warn("Failed assignment of "
>             + state.getRegion().getRegionNameAsString()
>             + " to "
>             + plan.getDestination()
>             + ", trying to assign "
>             + (regionAlreadyInTransitionException ? "to the same region 
> server"
>                 + " because of RegionAlreadyInTransitionException;" : 
> "elsewhere instead; ")
>             + "retry=" + i, t);
> {code}
> However, there's no wait time in the loop that retries the assignment, and if 
> region is being marked failed to open, which may take some time, master can 
> easily exhaust retries in less than half a second, going to the same server 
> every time and getting the same exception (unfortunately I no longer have 
> logs); then the region will be stuck.
> Do you think this is worth fixing (for example, by not using the same server 
> here after a few retries, or by adding timed backoff in such cases)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8150) the code that handles RAITE on master in 0.94 should not always use the same plan

Reply via email to