Stephen Yuan Jiang created HBASE-18036:
------------------------------------------
Summary: Data locality is not maintained after cluster restart or
SSH
Key: HBASE-18036
URL: https://issues.apache.org/jira/browse/HBASE-18036
Project: HBase
Issue Type: Bug
Components: Region Assignment
Affects Versions: 1.1.10, 1.2.5, 1.3.1, 1.4.0
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang
After HBASE-2896 / HBASE-4402, we think data locality is maintained after
cluster restart. However, we have seem some complains about data locality loss
when cluster restart (eg. HBASE-17963).
Examining the AssignmentManager#processDeadServersAndRegionsInTransition()
code, for cluster start, I expected to hit the following code path:
{code}
if (!failover) {
// Fresh cluster startup.
LOG.info("Clean cluster startup. Assigning user regions");
assignAllUserRegions(allRegions);
}
{code}
where assignAllUserRegions would use retainAssignment() call in LoadBalancer;
however, from master log, we usually hit the failover code path:
{code}
// If we found user regions out on cluster, its a failover.
if (failover) {
LOG.info("Found regions out on cluster or in RIT; presuming failover");
// Process list of dead servers and regions in RIT.
// See HBASE-4580 for more information.
processDeadServersAndRecoverLostRegions(deadServers);
}
{code}
where processDeadServersAndRecoverLostRegions() would put dead servers in SSH
and SSH uses roundRobinAssignment() in LoadBalancer. That is why we would see
loss locality more often than retaining locality during cluster restart.
Note: the code I was looking at is close to branch-1 and branch-1.1.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)