[ https://issues.apache.org/jira/browse/HBASE-20878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542403#comment-16542403 ]
Duo Zhang commented on HBASE-20878: ----------------------------------- {quote} What if we bring back DLR(distributed log replay) back and there is no recovered.edit. {quote} That's the problem for the one who wants to bring it back. Anyway, the problem here is that the region has not been closed normally, and I do not think a crashed RS is a good condition for testing it. Besides checking the recovered.edits, maybe we could check the region state? IIRC, we will not update the meta in SCP to move the region to CLOSED state. The assumption here is that, a region in CLOSED state must have been closed normally. Maybe we could introduce a state called ABNORMALLY_CLOSED, which indicates that the region will be processed by SCP. For now, I prefer checking recovered.edits more than checking crashed rs. Skimmed the code again, you use lastHost to determine whether the region has been on a crashed RS. {code} // notice that, the lastHost will only be updated when a region is successfully CLOSED through // UnassignProcedure, so do not use it for critical condition as the data maybe stale and unsync // with the data in meta. private volatile ServerName lastHost = null; {code} The comment is added by me when resolving HBASE-20792. The lastHost should not be used for critical condition... > Data loss if merging regions while ServerCrashProcedure executing > ----------------------------------------------------------------- > > Key: HBASE-20878 > URL: https://issues.apache.org/jira/browse/HBASE-20878 > Project: HBase > Issue Type: Bug > Components: amv2 > Affects Versions: 3.0.0, 2.1.0, 2.0.1 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Critical > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-20878.branch-2.0.001.patch, > HBASE-20878.branch-2.0.002.patch > > > In MergeTableRegionsProcedure, we close the regions to merge using > UnassignProcedure. But, if the RS these regions on is crashed, a > ServerCrashProcedure will execute at the same time. UnassignProcedures will > be blockd until all logs are split. But since these regions are closed for > merging, the regions won't open again, the recovered.edit in the region dir > won't be replay, thus, data will loss. > I provided a test to repo this case. I seriously doubt Split region procedure > also has this kind of problem. I will check later -- This message was sent by Atlassian JIRA (v7.6.3#76005)