stack created HBASE-14012:
-----------------------------
Summary: Double Assignment and Dataloss when ServerCrashProcedure
runs during Master failover
Key: HBASE-14012
URL: https://issues.apache.org/jira/browse/HBASE-14012
Project: HBase
Issue Type: Bug
Components: master, Region Assignment
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
ITBLL. Master comes up. It is joining a running cluster (all servers up except
Master with most regions assigned out on cluster). ProcedureStore has two
ServerCrashProcedures unfinished (RUNNABLE state). In SCP, we only check if
failover in first step, not for every step, which means ServerCrashProcedure
will run if on reload it is beyond the first step.
{code}
// Is master fully online? If not, yield. No processing of servers unless
master is up
if (!services.getAssignmentManager().isFailoverCleanupDone()) {
throwProcedureYieldException("Waiting on master failover to complete");
}
{code}
There is no definitive logging but it looks like we start running at the assign
step. The regions to assign were persisted before master crash. The regions to
assign may not make sense post crash: i.e. here we double-assign. Checking. We
shouldn't run until master is fully up regardless.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)