[
https://issues.apache.org/jira/browse/HBASE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607216#comment-13607216
]
ramkrishna.s.vasudevan commented on HBASE-8150:
-----------------------------------------------
bq.Agree with adding some sleep time,
This may be needed but trying on same server is going to prevent multi-assign.
Same as what Chunhui says.
> the code that handles RAITE on master in 0.94 should not always use the same
> plan
> ---------------------------------------------------------------------------------
>
> Key: HBASE-8150
> URL: https://issues.apache.org/jira/browse/HBASE-8150
> Project: HBase
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Priority: Minor
>
> The code in 0.94 AM sets the region plan to point to the same server when
> retrying the assignment due to RAITE.
> {code}
> LOG.warn("Failed assignment of "
> + state.getRegion().getRegionNameAsString()
> + " to "
> + plan.getDestination()
> + ", trying to assign "
> + (regionAlreadyInTransitionException ? "to the same region
> server"
> + " because of RegionAlreadyInTransitionException;" :
> "elsewhere instead; ")
> + "retry=" + i, t);
> {code}
> However, there's no wait time in the loop that retries the assignment, and if
> region is being marked failed to open, which may take some time, master can
> easily exhaust retries in less than half a second, going to the same server
> every time and getting the same exception (unfortunately I no longer have
> logs); then the region will be stuck.
> Do you think this is worth fixing (for example, by not using the same server
> here after a few retries, or by adding timed backoff in such cases)?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira