[
https://issues.apache.org/jira/browse/HBASE-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652990#comment-16652990
]
Jingyun Tian edited comment on HBASE-21322 at 10/17/18 5:48 AM:
----------------------------------------------------------------
[~stack]
The whole test process is like this:
1. kill one RS
2. delete all MasterProcWALs immediately.
3. check if the cluster can fail over.
But the result is there are splitting logs on HDFS, but no SeverCrashProcedure
scheduled.
!Screenshot from 2018-10-17 13-38-41.png!
Thus some regions never assign again. These regions' state are still the same
as the moment I delete all MasterProcWALs.
!Screenshot from 2018-10-17 13-35-58.png!
Thus some regions are recorded OPEN at a dead RS.
!Screenshot from 2018-10-17 13-47-06.png!
was (Author: tianjingyun):
The whole test process is like this:
1. kill one RS
2. delete all MasterProcWALs immediately.
3. check if the cluster can fail over.
But the result is there are splitting logs on HDFS, but no SeverCrashProcedure
scheduled.
!Screenshot from 2018-10-17 13-38-41.png!
Thus some regions never assign again. These regions' state are still the same
as the moment I delete all MasterProcWALs.
!Screenshot from 2018-10-17 13-35-58.png!
Thus some regions are recorded OPEN at a dead RS.
!Screenshot from 2018-10-17 13-47-06.png!
> Add a scheduleServerCrashProcedure() API to HbckService
> -------------------------------------------------------
>
> Key: HBASE-21322
> URL: https://issues.apache.org/jira/browse/HBASE-21322
> Project: HBase
> Issue Type: Sub-task
> Reporter: Jingyun Tian
> Assignee: Jingyun Tian
> Priority: Major
> Attachments: Screenshot from 2018-10-17 13-35-58.png, Screenshot from
> 2018-10-17 13-38-41.png, Screenshot from 2018-10-17 13-47-06.png
>
>
> According to my test, if one RS is down, then all procedure logs are deleted,
> it will lead to that no ServerCrashProcedure is scheduled. And restarting
> master cannot help. Thus we need to schedule a ServerCrashProcedure manually
> to solve the problem. I plan to add a scheduleServerCrashProcedure() API to
> HbckService, then add this API to HBCK2.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)