[
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278839#comment-13278839
]
ramkrishna.s.vasudevan commented on HBASE-6046:
-----------------------------------------------
The problem here is when the master retries to come out of zk expiry exception
and if he succeeds the entire master is almost recreated in the sense
{code}
try {
if (!becomeActiveMaster(status)) {
return Boolean.FALSE;
}
initializeZKBasedSystemTrackers();
// Update in-memory structures to reflect our earlier Root/Meta
assignment.
assignRootAndMeta(status);
// process RIT if any
// TODO: Why does this not call AssignmentManager.joinCluster?
Otherwise
// we are not processing dead servers if any.
assignmentManager.processDeadServersAndRegionsInTransition();
{code}
Here the initializeZKBasedSystemTrackers() will even create new
AssignmentManager. So what ever he does in
processDeadServersAndRegionsInTransition() is like a fresh start.
So in processDeadServersAndRegionsInTransition()
{code}
for (Map.Entry<HRegionInfo, ServerName> e: this.regions.entrySet()) {
if (!e.getKey().isMetaTable()
&& e.getValue() != null) {
LOG.debug("Found " + e + " out on cluster");
this.failover = true;
break;
}
{code}
Though all the RS is online we will have the 'this.regions' empty and hence we
go with completely new assignment.
> Master retry on ZK session expiry causes inconsistent region assignments.
> -------------------------------------------------------------------------
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.92.1, 0.94.0
> Reporter: Gopinathan A
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.94.1
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or
> backup comes up) all the node created in the ZK will now be tried to reassign
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the
> region count to 4700.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira