Sergey Shelukhin created HBASE-8150:
---------------------------------------
Summary: the code that handles RAITE on master in 0.94 should not
always use the same plan
Key: HBASE-8150
URL: https://issues.apache.org/jira/browse/HBASE-8150
Project: HBase
Issue Type: Bug
Reporter: Sergey Shelukhin
Priority: Minor
The code in 0.94 AM sets the region plan to point to the same server when
retrying the assignment due to RAITE.
{code}
LOG.warn("Failed assignment of "
+ state.getRegion().getRegionNameAsString()
+ " to "
+ plan.getDestination()
+ ", trying to assign "
+ (regionAlreadyInTransitionException ? "to the same region server"
+ " because of RegionAlreadyInTransitionException;" :
"elsewhere instead; ")
+ "retry=" + i, t);
{code}
However, there's no wait time in the loop that retries the assignment, and if
region is being marked failed to open, which may take some time, master can
easily exhaust retries in less than half a second (unfortunately I no longer
have logs) and region will be stuck.
Do you think this is worth fixing (for example, by not using the same server
here after a few retries, or by adding timed backoff in such cases)?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira