Li Chao created HBASE-28013:
-------------------------------

             Summary: procedureWAL can not delete after bypass procedure 
recurisive
                 Key: HBASE-28013
                 URL: https://issues.apache.org/jira/browse/HBASE-28013
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.2.7
            Reporter: Li Chao


We found 1.2T logs in /hbase1/MasterProcWALs, because WALProcedureStore can not 
delete wal logs after bypass parent procedure recurisive. 

When bypass TRSP recurisive, TRSP and ORP/CRP will run on race. TRSP call 
store.delete for ORP/CRP, then ORP/CRP release lock and call store.update to 
override delete. 

 
{code:java}
1. OPR bypassing and upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=241, ppid=240, state=RUNNABLE, bypass=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 
submittedTime=1691398235997, lastUpdate=1691398236007
stackIndexes=[2]
2. TRSP bypassing and upd wal by OPR
EntryType=PROCEDURE_WAL_UPDATE
pid=240, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true; 
TransitRegionStateProcedure table=t4, region=57f33b87c08a532869d4eaf9b16e9162, 
ASSIGN submittedTime=1691398235706, lastUpdate=1691398235997
stackIndexes=[0, 1]
3. TRSP bypassing and upd wal by self
EntryType=PROCEDURE_WAL_UPDATE
pid=240, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true; 
TransitRegionStateProcedure table=t4, region=57f33b87c08a532869d4eaf9b16e9162, 
ASSIGN submittedTime=1691398235706, lastUpdate=1691398235997
stackIndexes=[0, 1]
4. ORP running, get lock and upd wal 
EntryType=PROCEDURE_WAL_UPDATE
pid=241, ppid=240, state=RUNNABLE, bypass=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 
submittedTime=1691398235997, lastUpdate=1691398236007
stackIndexes=[2]
6. TRSP bypassing, set status and upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=240, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true; 
TransitRegionStateProcedure table=t4, region=57f33b87c08a532869d4eaf9b16e9162, 
ASSIGN submittedTime=1691398235706, lastUpdate=1691398251253
stackIndexes=[0, 1]
7. ORP running end, upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=241, ppid=240, state=SUCCESS, bypass=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 
submittedTime=1691398235997, lastUpdate=1691398251253
stackIndexes=[2, 3]
8. TRSP running end, delete wal for child
EntryType=PROCEDURE_WAL_DELETE
pid=240, state=SUCCESS, bypass=true; TransitRegionStateProcedure table=t4, 
region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706, 
lastUpdate=1691398251260
stackIndexes=[0, 1, 4]
9. TRSP release lock and upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=240, state=SUCCESS, bypass=true; TransitRegionStateProcedure table=t4, 
region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706, 
lastUpdate=1691398251260
stackIndexes=[0, 1, 4]
10. ORP release lock and upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=241, ppid=240, state=SUCCESS, bypass=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 
submittedTime=1691398235997, lastUpdate=1691398251253
stackIndexes=[2, 3]
11. TRSP delete wal in compled
EntryType=PROCEDURE_WAL_DELETE
EntryType=PROCEDURE_WAL_DELETE {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to