Yunfan Zhong created HBASE-10464:
------------------------------------

             Summary: Race condition during RS shutdown that could cause data 
loss
                 Key: HBASE-10464
                 URL: https://issues.apache.org/jira/browse/HBASE-10464
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.89-fb
            Reporter: Yunfan Zhong
            Priority: Critical
             Fix For: 0.89-fb


Bug scenario (T* are timestamps, say T1 < T2 < ... < Tn):
1. Master assigns a region to RS at T1
2. RS works on opening the region during T1 to T3
3. In the mean time of opening the region, RS starts to shut down at T2, and 
dfs client is closed at T5.
4. Regions owned by the RS get closed as a step of RS shutdown except that the 
newly opened region is online during T3 to T5 and holds some mutations in 
memory after possible last flush T4.
5. Since master thinks RS has a clean shutdown, there is no log splitting. The 
HLog was moved to old logs directory naturally.
6. Mutations in memory between T4 to T5 (if T4 does not exist, T3 to T5) are 
not flushed. They only exist in WAL if it is turned on.

Fix is to prevent region opening from succeeding when the RS is shutting down.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to