[jira] [Commented] (HBASE-20878) Data loss if merging regions while ServerCrashProcedure executing

Duo Zhang (JIRA) Thu, 12 Jul 2018 18:47:17 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-20878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542403#comment-16542403
 ]


Duo Zhang commented on HBASE-20878:
-----------------------------------

{quote}
What if we bring back DLR(distributed log replay) back and there is no 
recovered.edit. 
{quote}
That's the problem for the one who wants to bring it back.

Anyway, the problem here is that the region has not been closed normally, and I 
do not think a crashed RS is a good condition for testing it. Besides checking 
the recovered.edits, maybe we could check the region state? IIRC, we will not 
update the meta in SCP to move the region to CLOSED state. The assumption here 
is that, a region in CLOSED state must have been closed normally. Maybe we 
could introduce a state called ABNORMALLY_CLOSED, which indicates that the 
region will be processed by SCP.

For now, I prefer checking recovered.edits more than checking crashed rs. 
Skimmed the code again, you use lastHost to determine whether the region has 
been on a crashed RS.
{code}
    // notice that, the lastHost will only be updated when a region is 
successfully CLOSED through
    // UnassignProcedure, so do not use it for critical condition as the data 
maybe stale and unsync
    // with the data in meta.
    private volatile ServerName lastHost = null;
{code}

The comment is added by me when resolving HBASE-20792. The lastHost should not 
be used for critical condition...



> Data loss if merging regions while ServerCrashProcedure executing
> -----------------------------------------------------------------
>
>                 Key: HBASE-20878
>                 URL: https://issues.apache.org/jira/browse/HBASE-20878
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>    Affects Versions: 3.0.0, 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Critical
>             Fix For: 3.0.0, 2.0.2, 2.1.1
>
>         Attachments: HBASE-20878.branch-2.0.001.patch, 
> HBASE-20878.branch-2.0.002.patch
>
>
> In MergeTableRegionsProcedure, we close the regions to merge using 
> UnassignProcedure. But, if the RS these regions on is crashed, a 
> ServerCrashProcedure will execute at the same time. UnassignProcedures will 
> be blockd until all logs are split. But since these regions are closed for 
> merging, the regions won't open again, the recovered.edit in the region dir 
> won't be replay, thus, data will loss.
> I provided a test to repo this case. I seriously doubt Split region procedure 
> also has this kind of problem. I will check later



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20878) Data loss if merging regions while ServerCrashProcedure executing

Reply via email to