[ 
https://issues.apache.org/jira/browse/HBASE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769061#comment-17769061
 ] 

chaijunjie edited comment on HBASE-28013 at 9/26/23 8:35 AM:
-------------------------------------------------------------

we faced the same problem when use HBase 2.2.3,we upgrade to HBase 2.4, it not 
happend.

For HBase 2.3+ changed the design, use the  
https://issues.apache.org/jira/browse/HBASE-23326 instead, the 
WALProcedureStore may have some bugs when execute bypass and SCP at same time...

you can use a newer HBase version


was (Author: JIRAUSER286971):
we faced the same problem when use HBase 2.2.3,we upgrade to HBase 2.4, it not 
happend.

For HBase 2.3+ changed the design, use the  
https://issues.apache.org/jira/browse/HBASE-23326 instead, the 
WALProcedureStore may have some bugs when execute bypass and SCP at same time...

> procedureWAL can not delete after bypass procedure recurisive
> -------------------------------------------------------------
>
>                 Key: HBASE-28013
>                 URL: https://issues.apache.org/jira/browse/HBASE-28013
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.2.7
>            Reporter: Li Chao
>            Priority: Major
>
> We found 1.2T logs in /hbase1/MasterProcWALs, because WALProcedureStore can 
> not delete wal logs after bypass parent procedure recurisive. 
> When bypass TRSP recurisive, TRSP and ORP/CRP will run on race. TRSP call 
> store.delete for ORP/CRP, then ORP/CRP release lock and call store.update to 
> override delete. 
>  
> {code:java}
> 1. OPR bypassing and upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=241, ppid=240, state=RUNNABLE, bypass=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 
> submittedTime=1691398235997, lastUpdate=1691398236007
> stackIndexes=[2]
> 2. TRSP bypassing and upd wal by OPR
> EntryType=PROCEDURE_WAL_UPDATE
> pid=240, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true; 
> TransitRegionStateProcedure table=t4, 
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706, 
> lastUpdate=1691398235997
> stackIndexes=[0, 1]
> 3. TRSP bypassing and upd wal by self
> EntryType=PROCEDURE_WAL_UPDATE
> pid=240, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true; 
> TransitRegionStateProcedure table=t4, 
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706, 
> lastUpdate=1691398235997
> stackIndexes=[0, 1]
> 4. ORP running, get lock and upd wal 
> EntryType=PROCEDURE_WAL_UPDATE
> pid=241, ppid=240, state=RUNNABLE, bypass=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 
> submittedTime=1691398235997, lastUpdate=1691398236007
> stackIndexes=[2]
> 6. TRSP bypassing, set status and upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=240, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, bypass=true; 
> TransitRegionStateProcedure table=t4, 
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706, 
> lastUpdate=1691398251253
> stackIndexes=[0, 1]
> 7. ORP running end, upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=241, ppid=240, state=SUCCESS, bypass=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 
> submittedTime=1691398235997, lastUpdate=1691398251253
> stackIndexes=[2, 3]
> 8. TRSP running end, delete wal for child
> EntryType=PROCEDURE_WAL_DELETE
> pid=240, state=SUCCESS, bypass=true; TransitRegionStateProcedure table=t4, 
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706, 
> lastUpdate=1691398251260
> stackIndexes=[0, 1, 4]
> 9. TRSP release lock and upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=240, state=SUCCESS, bypass=true; TransitRegionStateProcedure table=t4, 
> region=57f33b87c08a532869d4eaf9b16e9162, ASSIGN submittedTime=1691398235706, 
> lastUpdate=1691398251260
> stackIndexes=[0, 1, 4]
> 10. ORP release lock and upd wal
> EntryType=PROCEDURE_WAL_UPDATE
> pid=241, ppid=240, state=SUCCESS, bypass=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 
> submittedTime=1691398235997, lastUpdate=1691398251253
> stackIndexes=[2, 3]
> 11. TRSP delete wal in compled
> EntryType=PROCEDURE_WAL_DELETE
> EntryType=PROCEDURE_WAL_DELETE {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to