[ 
https://issues.apache.org/jira/browse/HBASE-28690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-28690.
-------------------------------
    Fix Version/s: 2.7.0
                   3.0.0-beta-2
                   2.6.1
                   2.5.11
     Hadoop Flags: Reviewed
       Resolution: Fixed

Pushed to all active branches.

Thanks [~umesh9414] for contributing!

> Aborting Active HMaster is not rejecting reportRegionStateTransition if 
> procedure is initialised by next Active master
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-28690
>                 URL: https://issues.apache.org/jira/browse/HBASE-28690
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 2.5.8
>            Reporter: Umesh Kumar Kumawat
>            Assignee: Umesh Kumar Kumawat
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11
>
>
> A CloseRegionProcedure on master requests the RS to close the region and 
> after closing the region RS reports RegionStateTransition 
> back([here|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L1853]).
>  On receiving the report, the master checks if regionNode has any procedure 
> assigned to it 
> ([code|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1294]).
>  
>  
> {code:java}
>  private boolean reportTransition(RegionStateNode regionNode, ServerStateNode 
> serverNode,
>     TransitionCode state, long seqId, long procId) throws IOException {
>     ServerName serverName = serverNode.getServerName();
>     TransitRegionStateProcedure proc = regionNode.getProcedure();
>     if (proc == null) {
>       return false;
>     }
>     
> proc.reportTransition(master.getMasterProcedureExecutor().getEnvironment(), 
> regionNode,
>       serverName, state, seqId, procId);
>     return true;
>   } {code}
> If regionNode doesn't have any procedure, the master just logs it and doesn't 
> throw any error to RPC. 
>  
> Think of a case when MasterFailover is happening and the new Active master 
> only initialized the TRSP and CloseRegionProcedure. Now aborting Master has 
> stale/false data. If the transition report comes to the aborting master, not 
> rejecting this report is causing the procedure to get stuck. 
>  
> *Logs for more understanding* 
> active master server4-1 failing
> {noformat}
> 2024-06-20 04:45:05,576 ERROR 
> [iority.RWQ.Fifo.write.handler=3,queue=0,port=61000] master.HMaster - ***** 
> ABORTING master server4-1,61000,1715413775736: Failed to record region server 
> as started *****{noformat}
> *logs of new active master server5-1*
>  
> {noformat}
> 2024-06-20 04:49:28,893 DEBUG [aster/server5-1:61000:becomeActiveMaster] 
> assignment.RegionStateStore - Load hbase:meta entry 
> region=888a715d5926adbb89c985d8967f40d4, regionState=OPEN, 
> lastHost=server1-119,61020,1717560166420, 
> regionLocation=server1-119,61020,1717560166420, openSeqNum=34892620
> 024-06-20 04:49:51,886 INFO [PEWorker-22] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[{pid=16276416, ppid=16276108, 
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure 
> table=RIMBS.UPLOADER_JOB_DETAILS, region=888a715d5926adbb89c985d8967f40d4, 
> UNASSIGN}]  (on server5-1)
> 2024-06-20 04:49:52,022 INFO [PEWorker-40] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[{pid=16276470, ppid=16276416, state=RUNNABLE; 
> CloseRegionProcedure 888a715d5926adbb89c985d8967f40d4, 
> server=server1-119,61020,1717560166420}] (on server5-1){noformat}
>  
> *RS logs for closing* 
> {noformat}
> 2024-06-20 04:49:52,267 INFO [_REGION-regionserver/server1-119:61020-2] 
> handler.UnassignRegionHandler - Close 888a715d5926adbb89c985d8967f40d4
> 2024-06-20 04:49:52,267 DEBUG [_REGION-regionserver/server1-119:61020-2] 
> regionserver.HRegion - Closing 888a715d5926adbb89c985d8967f40d4, disabling 
> compactions & flushes
> 2024-06-20 04:49:52,354 INFO [_REGION-regionserver/server1-119:61020-2] 
> regionserver.HRegion - Closed 
> TABLE,KW\x00na240-app1-16\x00/Events-120620231740\x00MARKER-Events,1702619592612.888a715d5926adbb89c985d8967f40d4.
> {noformat}
> *Logs of report on aborting active Hmaster*
> {noformat}
> 2024-06-20 04:49:52,355 WARN 
> [iority.RWQ.Fifo.write.handler=1,queue=0,port=61000] 
> assignment.AssignmentManager - No matching procedure found for 
> server1-119,61020,1717560166420 transition on state=OPEN, 
> location=server1-119,61020,1717560166420, table=RIMBS.UPLOADER_JOB_DETAILS, 
> region=888a715d5926adbb89c985d8967f40d4 to CLOSED ( host = server4-1 , 
> hbaseMasterLogFile){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to