[jira] [Commented] (HBASE-8150) the code that handles RAITE on master in 0.94 should not always use the same plan

chunhui shen (JIRA) Tue, 19 Mar 2013 21:15:25 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607261#comment-13607261
 ]


chunhui shen commented on HBASE-8150:
-------------------------------------

Trunk has already done as the above
{code}
if (Boolean.TRUE.equals(previous)) {
          // An open is in progress. This is supported, but let's log this.
          LOG.info("Receiving OPEN for the region:" +
              region.getRegionNameAsString() + " , which we are already trying 
to OPEN" +
              " - ignoring this new request for this region.");
        }
{code}

in 0.94 branch, we could also ignore the RegionAlreadyInTransitionException , 
not throw it to master
                
> the code that handles RAITE on master in 0.94 should not always use the same 
> plan
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-8150
>                 URL: https://issues.apache.org/jira/browse/HBASE-8150
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Minor
>
> The code in 0.94 AM sets the region plan to point to the same server when 
> retrying the assignment due to RAITE.
> {code}
> LOG.warn("Failed assignment of "
>             + state.getRegion().getRegionNameAsString()
>             + " to "
>             + plan.getDestination()
>             + ", trying to assign "
>             + (regionAlreadyInTransitionException ? "to the same region 
> server"
>                 + " because of RegionAlreadyInTransitionException;" : 
> "elsewhere instead; ")
>             + "retry=" + i, t);
> {code}
> However, there's no wait time in the loop that retries the assignment, and if 
> region is being marked failed to open, which may take some time, master can 
> easily exhaust retries in less than half a second, going to the same server 
> every time and getting the same exception (unfortunately I no longer have 
> logs); then the region will be stuck.
> Do you think this is worth fixing (for example, by not using the same server 
> here after a few retries, or by adding timed backoff in such cases)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8150) the code that handles RAITE on master in 0.94 should not always use the same plan

Reply via email to