[jira] [Commented] (HBASE-24117) If move target RS crashes, move fails if concurrent master crash

Duo Zhang (Jira) Sun, 05 Apr 2020 00:48:28 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17075742#comment-17075742
 ]


Duo Zhang commented on HBASE-24117:
-----------------------------------

Hi sir, I think this is the actual problem, the new master failed to start.

{noformat}
2020-03-26 21:02:16,214 INFO  [master/asf905:0:becomeActiveMaster] 
region.RegionProcedureStore(451): Migration of WALProcedureStore finished
2020-03-26 21:02:16,214 INFO  [master/asf905:0:becomeActiveMaster] 
procedure2.ProcedureExecutor(589): Recovered RegionProcedureStore lease in 858 
msec
2020-03-26 21:02:16,219 ERROR [master/asf905:0:becomeActiveMaster] 
helpers.MarkerIgnoringBase(159): Failed to become active master
org.apache.hadoop.hbase.NotServingRegionException: 
master:procedure,,1.9731aea823e7f83264b14713ae486fb7. is closing
        at 
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8549)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2957)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2952)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2946)
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:504)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:331)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:601)
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1542)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:932)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2189)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:609)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

The region is closing? A bit strange...

> If move target RS crashes, move fails if concurrent master crash
> ----------------------------------------------------------------
>
>                 Key: HBASE-24117
>                 URL: https://issues.apache.org/jira/browse/HBASE-24117
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>            Reporter: Michael Stack
>            Priority: Major
>         Attachments: 
> org.apache.hadoop.hbase.master.TestMasterShutdown-output.txt
>
>
> I saw this on TestCloseRegionWithRSCrash. The Region 
> 788a516d1f86af98e0a16bcc1afe4fa1 was being moved to RS  
> example.com,62652,1586032098445 just after it was killed. The Move Close 
> fails because the RS has no node in the Master. The Move then tries to 
> 'confirm' the close but it fails because no remote RS. We are then to wait in 
> this state until operator or some other procedure intervenes to 'fix' the 
> state. Normally a ServerCrashProcedure would do the job but in this test the 
> Master is restarted after the RS is killed, a condition we do not accommodate.
> Let me attach the test log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24117) If move target RS crashes, move fails if concurrent master crash

Reply via email to