Viraj Jasani created HBASE-26433:
------------------------------------
Summary: Rollback from ZK-less to ZK-based assignment could
produce inconsistent state - doubly assigned regions
Key: HBASE-26433
URL: https://issues.apache.org/jira/browse/HBASE-26433
Project: HBase
Issue Type: Bug
Affects Versions: 1.7.1
Reporter: Viraj Jasani
Assignee: Viraj Jasani
Fix For: 1.7.2
By enabling configĀ {_}hbase.assignment.usezk.migrating{_}, we initiate the
transition of HBase 1.x cluster from default ZK-based region assignment to
ZK-less region assignments. Once the migration is enabled, any subsequent
region transition is going to add two additional CQs in meta: info:sn and
info:state. The workflow that adds new CQs in meta should be the only workflow
reading it (unless it requires coordination among multiple workflows), however
that is not the case here. Reading info:sn and info:state to rebuild user
region states in RegionStateStore data structure is a hidden bug because it
doesn't restrict the usage for only ZK-less region assignment.
What are the effects?
After enabling ZK-less migration, if we revert it back, info:state and info:sn
are not reverted. Moreover, new active master rebuilds the region states in
memory and use this info. So if all regions have consistent info:sn values
(i.e. consistent with info:server and info:serverstartcode), nothing goes wrong
and this is likely going to happen when we revert the config with rolling
restart of masters. However, after this config revert, if any region moves,
only info:server and info:serverstartcode get updated but info:sn and
info:state values stay the same. Because of the missing condition, subsequent
active master restart would try to rebuild regions and assign regions as per
info:sn, but those regions are already OPEN on info:server, hence we get doubly
assigned regions.
We need two part fix for this:
# Guard reading of info:sn and info:state with proper conditions.
# Once active master init is complete, if ZK-based region assignment is
enabled and redundant CQs are available in meta (info:sn and info:state),
delete them all.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)