[
https://issues.apache.org/jira/browse/HBASE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612446#comment-14612446
]
stack commented on HBASE-14012:
-------------------------------
Here is a bit of log:
{code}
2015-06-09 20:06:20,270 INFO [c2020:16000.activeMasterManager]
master.ServerManager: AssignmentManager hasn't finished failover cleanup;
waiting
2015-06-09 20:06:20,272 INFO [c2020:16000.activeMasterManager] master.HMaster:
hbase:meta with replicaId 0 assigned=0, rit=false,
location=c2025.halxg.cloudera.com,16020,1433892619022
2015-06-09 20:06:20,295 DEBUG [ProcedureExecutorThread-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/c607a47967fd4873135f38e883156e4d/big
2015-06-09 20:06:20,295 DEBUG [ProcedureExecutorThread-10]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/1a5a90047a76da6dddebb5aff0acb275/big
2015-06-09 20:06:20,342 DEBUG [hconnection-0x680c3bc0-shared--pool3-t1]
ipc.RpcClientImpl: Use SIMPLE authentication for service ClientService,
sasl=false
2015-06-09 20:06:20,342 DEBUG [hconnection-0x680c3bc0-shared--pool3-t1]
ipc.RpcClientImpl: Connecting to c2025.halxg.cloudera.com/10.20.84.31:16020
2015-06-09 20:06:20,376 DEBUG [ProcedureExecutorThread-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/c607a47967fd4873135f38e883156e4d/tiny
2015-06-09 20:06:20,379 DEBUG [ProcedureExecutorThread-10]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/1a5a90047a76da6dddebb5aff0acb275/tiny
2015-06-09 20:06:20,383 DEBUG [ProcedureExecutorThread-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/d586e9037f683384411ab2663e31f97b/big
2015-06-09 20:06:20,383 DEBUG [ProcedureExecutorThread-10]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/ce4ebb9a375a1fe4b5777d2d960c940c/big
2015-06-09 20:06:20,420 DEBUG [ProcedureExecutorThread-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/d586e9037f683384411ab2663e31f97b/tiny
2015-06-09 20:06:20,421 DEBUG [ProcedureExecutorThread-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/6dc837d3ec4e2afd05314472ee17ca80/big
2015-06-09 20:06:20,422 DEBUG [ProcedureExecutorThread-10]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/ce4ebb9a375a1fe4b5777d2d960c940c/tiny
2015-06-09 20:06:20,423 DEBUG [ProcedureExecutorThread-10]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/6fbe22ff15c2e5f2b207f79eaf8f382a/big
2015-06-09 20:06:20,453 DEBUG [ProcedureExecutorThread-10]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/6fbe22ff15c2e5f2b207f79eaf8f382a/tiny
...
2015-06-09 20:06:20,795 DEBUG [ProcedureExecutorThread-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/IntegrationTestBigLinkedList/0983b02ec079ea8ac2fb2901dbe2a6fb/tiny
2015-06-09 20:06:20,797 INFO [ProcedureExecutorThread-4]
master.AssignmentManager: Bulk assigning 9 region(s) across 5 server(s),
round-robin=true
....
2015-06-09 20:06:20,909 INFO [c2020:16000.activeMasterManager]
master.AssignmentManager: Found regions out on cluster or in RIT; presuming
failover
{code}
Its the bulk assign there on the end that is doing assign of regions already
out on cluster.
> Double Assignment and Dataloss when ServerCrashProcedure runs during Master
> failover
> ------------------------------------------------------------------------------------
>
> Key: HBASE-14012
> URL: https://issues.apache.org/jira/browse/HBASE-14012
> Project: HBase
> Issue Type: Bug
> Components: master, Region Assignment
> Affects Versions: 2.0.0, 1.2.0
> Reporter: stack
> Assignee: stack
> Priority: Critical
>
> ITBLL. Master comes up. It is joining a running cluster (all servers up
> except Master with most regions assigned out on cluster). ProcedureStore has
> two ServerCrashProcedures unfinished (RUNNABLE state). In SCP, we only check
> if failover in first step, not for every step, which means
> ServerCrashProcedure will run if on reload it is beyond the first step.
> {code}
> // Is master fully online? If not, yield. No processing of servers unless
> master is up
> if (!services.getAssignmentManager().isFailoverCleanupDone()) {
> throwProcedureYieldException("Waiting on master failover to complete");
> }
> {code}
> There is no definitive logging but it looks like we start running at the
> assign step. The regions to assign were persisted before master crash. The
> regions to assign may not make sense post crash: i.e. here we double-assign.
> Checking. We shouldn't run until master is fully up regardless.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)