[
https://issues.apache.org/jira/browse/HBASE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769061#comment-17769061
]
chaijunjie edited comment on HBASE-28013 at 9/26/23 8:35 AM:
-------------------------------------------------------------
we faced the same problem when use HBase 2.2.3,we upgrade to HBase 2.4, it not
happend.
For HBase 2.3+ changed the design, use the
https://issues.apache.org/jira/browse/HBASE-23326 instead, the
WALProcedureStore may have some bugs when execute bypass and SCP at same time...
you can use a newer HBase version
was (Author: JIRAUSER286971):
we faced the same problem when use HBase 2.2.3,we upgrade to HBase 2.4, it not
happend.
For HBase 2.3+ changed the design, use the
https://issues.apache.org/jira/browse/HBASE-23326 instead, the
WALProcedureStore may have some bugs when execute bypass and SCP at same time...
> procedureWAL can not delete after bypass procedure recurisive
> -------------------------------------------------------------
>
> Key: HBASE-28013
> URL: https://issues.apache.org/jira/browse/HBASE-28013
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.2.7
> Reporter: Li Chao
> Priority: Major
>
> We found 1.2T logs in /hbase1/MasterProcWALs, because WALProcedureStore can
> not delete wal logs after bypass parent procedure recurisive.
> When bypass TRSP recurisive, TRSP and ORP/CRP will run on race. TRSP call
> store.delete for ORP/CRP, then ORP/CRP release lock and call store.update to
> override delete.
>
> {code:java}
> 1. OPR bypassing and upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=241, ppid=240, state=RUNNABLE, bypass=true;
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
> submittedTime=1691398235997, lastUpdate=1691398236007
> stackIndexes=[2]
> 2. TRSP bypassing and upd wal by OPR
> EntryType=PROCEDURE_WAL_UPDATE
> pid=240, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true;
> TransitRegionStateProcedure table=t4,
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706,
> lastUpdate=1691398235997
> stackIndexes=[0, 1]
> 3. TRSP bypassing and upd wal by self
> EntryType=PROCEDURE_WAL_UPDATE
> pid=240, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true;
> TransitRegionStateProcedure table=t4,
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706,
> lastUpdate=1691398235997
> stackIndexes=[0, 1]
> 4. ORP running, get lock and upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=241, ppid=240, state=RUNNABLE, bypass=true;
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
> submittedTime=1691398235997, lastUpdate=1691398236007
> stackIndexes=[2]
> 6. TRSP bypassing, set status and upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=240, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true;
> TransitRegionStateProcedure table=t4,
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706,
> lastUpdate=1691398251253
> stackIndexes=[0, 1]
> 7. ORP running end, upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=241, ppid=240, state=SUCCESS, bypass=true;
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
> submittedTime=1691398235997, lastUpdate=1691398251253
> stackIndexes=[2, 3]
> 8. TRSP running end, delete wal for child
> EntryType=PROCEDURE_WAL_DELETE
> pid=240, state=SUCCESS, bypass=true; TransitRegionStateProcedure table=t4,
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706,
> lastUpdate=1691398251260
> stackIndexes=[0, 1, 4]
> 9. TRSP release lock and upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=240, state=SUCCESS, bypass=true; TransitRegionStateProcedure table=t4,
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706,
> lastUpdate=1691398251260
> stackIndexes=[0, 1, 4]
> 10. ORP release lock and upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=241, ppid=240, state=SUCCESS, bypass=true;
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
> submittedTime=1691398235997, lastUpdate=1691398251253
> stackIndexes=[2, 3]
> 11. TRSP delete wal in compled
> EntryType=PROCEDURE_WAL_DELETE
> EntryType=PROCEDURE_WAL_DELETE {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)