Aman Poonia created HBASE-28405:
-----------------------------------

             Summary: Region open procedure silently does nothing without 
notifying the parent proc
                 Key: HBASE-28405
                 URL: https://issues.apache.org/jira/browse/HBASE-28405
             Project: HBase
          Issue Type: Bug
          Components: proc-v2
    Affects Versions: 2.5.7
            Reporter: Aman Poonia
            Assignee: Aman Poonia


We had a scenario in production where a merge operation had failed as below

_2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
assignment.MergeTableRegionsProcedure - Error trying to merge 
[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in table1 
(in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
_org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
location=rs-229,60020,1707587658182, table=table1, 
region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
_at 
org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
_at 
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
_at 
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
_at 
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
_at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
_at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
_at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
_at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
_at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
_at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
_at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
_at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_

Now when we do rollback of failed merge operation we see a issue where region 
is in state opened until the RS holding it stopped.

Rollback create a TRSP as below

_2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
Stored [pid=26674602, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
TransitRegionStateProcedure table=table1, 
region=a92008b76ccae47d55c590930b837036, ASSIGN]_

and rollback finished successfully

_2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
Rolled back pid=26673594, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.HBaseIOException via 
master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
MergeTableRegionsProcedure table=table1, 
regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
force=false exec-time=1.4820 sec_

We create a procedure to open the region a92008b76ccae47d55c590930b837036

Intrestingly we didnt close the region as creation of procedure to close 
regions had thrown exception and not execution of procedure.

Now when we run TRSP it sends a OpenRegionProcedure which is handled by 
AssignRegionHandler

This handlers on execution suggests that region is already online

Sequence of events are as follow

_2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
regionState=OPENING, regionLocation=rs-210,60020,1707596461539_

_2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - 
Initialized subprocedures=[\{pid=26675798, ppid=26674602, state=RUNNABLE; 
OpenRegionProcedure a92008b76ccae47d55c590930b837036, 
server=rs-210,60020,1707596461539}]_

_2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler - Received OPEN for 
table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already 
online_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to