Li Chao created HBASE-28013:
-------------------------------
Summary: procedureWAL can not delete after bypass procedure
recurisive
Key: HBASE-28013
URL: https://issues.apache.org/jira/browse/HBASE-28013
Project: HBase
Issue Type: Bug
Affects Versions: 2.2.7
Reporter: Li Chao
We found 1.2T logs in /hbase1/MasterProcWALs, because WALProcedureStore can not
delete wal logs after bypass parent procedure recurisive.
When bypass TRSP recurisive, TRSP and ORP/CRP will run on race. TRSP call
store.delete for ORP/CRP, then ORP/CRP release lock and call store.update to
override delete.
{code:java}
1. OPR bypassing and upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=241, ppid=240, state=RUNNABLE, bypass=true;
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
submittedTime=1691398235997, lastUpdate=1691398236007
stackIndexes=[2]
2. TRSP bypassing and upd wal by OPR
EntryType=PROCEDURE_WAL_UPDATE
pid=240, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true;
TransitRegionStateProcedure table=t4, region=57f33b87c08a532869d4eaf9b16e9162,
ASSIGN submittedTime=1691398235706, lastUpdate=1691398235997
stackIndexes=[0, 1]
3. TRSP bypassing and upd wal by self
EntryType=PROCEDURE_WAL_UPDATE
pid=240, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true;
TransitRegionStateProcedure table=t4, region=57f33b87c08a532869d4eaf9b16e9162,
ASSIGN submittedTime=1691398235706, lastUpdate=1691398235997
stackIndexes=[0, 1]
4. ORP running, get lock and upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=241, ppid=240, state=RUNNABLE, bypass=true;
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
submittedTime=1691398235997, lastUpdate=1691398236007
stackIndexes=[2]
6. TRSP bypassing, set status and upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=240, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true;
TransitRegionStateProcedure table=t4, region=57f33b87c08a532869d4eaf9b16e9162,
ASSIGN submittedTime=1691398235706, lastUpdate=1691398251253
stackIndexes=[0, 1]
7. ORP running end, upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=241, ppid=240, state=SUCCESS, bypass=true;
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
submittedTime=1691398235997, lastUpdate=1691398251253
stackIndexes=[2, 3]
8. TRSP running end, delete wal for child
EntryType=PROCEDURE_WAL_DELETE
pid=240, state=SUCCESS, bypass=true; TransitRegionStateProcedure table=t4,
region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706,
lastUpdate=1691398251260
stackIndexes=[0, 1, 4]
9. TRSP release lock and upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=240, state=SUCCESS, bypass=true; TransitRegionStateProcedure table=t4,
region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706,
lastUpdate=1691398251260
stackIndexes=[0, 1, 4]
10. ORP release lock and upd wal
EntryType=PROCEDURE_WAL_UPDATE
pid=241, ppid=240, state=SUCCESS, bypass=true;
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
submittedTime=1691398235997, lastUpdate=1691398251253
stackIndexes=[2, 3]
11. TRSP delete wal in compled
EntryType=PROCEDURE_WAL_DELETE
EntryType=PROCEDURE_WAL_DELETE {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)