[jira] [Commented] (HBASE-28180) TestClusterRestartFailover fails in pre commit build

Duo Zhang (Jira) Sun, 29 Oct 2023 04:58:04 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-28180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780722#comment-17780722
 ]


Duo Zhang commented on HBASE-28180:
-----------------------------------

The strange log output

{noformat}
2023-10-28T05:19:32,961 INFO  [PEWorker-3 {}] 
procedure.ServerCrashProcedure(285): removed crashed server 
60c4f7698da2,35037,1698470136730 after splitting done
2023-10-28T05:19:32,964 INFO  [PEWorker-3 {}] 
procedure2.ProcedureExecutor(1413): Finished pid=59, state=SUCCESS, 
hasLock=false; ServerCrashProcedure 60c4f7698da2,35037,1698470136730, 
splitWal=true, meta=false in 9.7560 sec
2023-10-28T05:19:32,965 INFO  [PEWorker-2 {}] 
procedure2.ProcedureExecutor(1827): Finished subprocedure pid=78, resume 
processing ppid=59
2023-10-28T05:19:32,965 INFO  [PEWorker-2 {}] 
procedure2.ProcedureExecutor(1413): Finished pid=78, ppid=59, state=SUCCESS, 
hasLock=false; TransitRegionStateProcedure table=restartTableOne, 
region=f29e60f0f76f9ceeb268843c64bfad94, ASSIGN in 4.1960 sec
2023-10-28T05:19:32,971 DEBUG [RS_OPEN_REGION-regionserver/60c4f7698da2:34499-0 
{event_type=M_RS_OPEN_REGION, pid=107}] regionserver.HRegion(5511): Found 1 
recovered edits file(s) under 
hdfs://localhost:39421/user/jenkins/test-data/d2b51124-8600-9f5c-75c8-bd90c2afe83b/data/default/restartTableOne/9f3abed92bdd11b56fa836a98f96b7e5
2023-10-28T05:19:32,972 INFO  [RS_OPEN_REGION-regionserver/60c4f7698da2:34499-0 
{event_type=M_RS_OPEN_REGION, pid=107}] regionserver.HRegion(5578): Replaying 
edits from 
hdfs://localhost:39421/user/jenkins/test-data/d2b51124-8600-9f5c-75c8-bd90c2afe83b/data/default/restartTableOne/9f3abed92bdd11b56fa836a98f96b7e5/recovered.edits/0000000000000000103
2023-10-28T05:19:33,004 DEBUG [Listener at localhost/42171 {}] 
zookeeper.RecoverableZooKeeper(221): Node 
/hbase/draining/60c4f7698da2,35037,1698470136730 already deleted, retry=false
2023-10-28T05:19:33,004 INFO  [Listener at localhost/42171 {}] 
master.ServerManager(604): Processing expiration of 
60c4f7698da2,35037,1698470136730 on 60c4f7698da2,42173,1698470361611
2023-10-28T05:19:33,006 DEBUG [Listener at localhost/42171 {}] 
procedure2.ProcedureExecutor(1032): Stored pid=114, 
state=RUNNABLE:SERVER_CRASH_START, hasLock=false; ServerCrashProcedure 
60c4f7698da2,35037,1698470136730, splitWal=true, meta=false
{noformat}

Pid 59 is the SCP for '60c4f7698da2,35037,1698470136730', the RS which want to 
test its SCP. But there is another ‘Processing expiration of 
60c4f7698da2,35037,1698470136730’ right after we finish the SCP, and then it 
scheduled another SCP for the same RS. This should not happen...

> TestClusterRestartFailover fails in pre commit build
> ----------------------------------------------------
>
>                 Key: HBASE-28180
>                 URL: https://issues.apache.org/jira/browse/HBASE-28180
>             Project: HBase
>          Issue Type: Bug
>          Components: master, proc-v2, test
>            Reporter: Duo Zhang
>            Priority: Major
>         Attachments: 
> org.apache.hadoop.hbase.master.TestClusterRestartFailover-output.txt
>
>
> It failed two times in this PR.
> https://github.com/apache/hbase/pull/5475
> Filed an issue to track this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28180) TestClusterRestartFailover fails in pre commit build

Reply via email to