[ 
https://issues.apache.org/jira/browse/HBASE-22168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-22168:
-------------------------------------
    Affects Version/s: 3.0.0

> proc WALs with non-corrupted-but-"corrupted" procedures block WAL archiving 
> forever
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-22168
>                 URL: https://issues.apache.org/jira/browse/HBASE-22168
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Sergey Shelukhin
>            Priority: Critical
>
> I've reported the bug before where we get these messages when loading proc WAL
> {noformat}
> 2019-04-04 14:43:00,424 ERROR [master/...:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 43459, max stack id is 43460, root 
> procedure is Procedure(pid=43645, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure)
> {noformat}
> resulting in 
> {noformat}
> 2019-04-04 14:43:16,176 ERROR [...:17000:becomeActiveMaster] 
> procedure2.ProcedureExecutor: Corrupt pid=43645, 
> state=WAITING:SERVER_CRASH_FINISH, hasLock=false; ServerCrashProcedure 
> server=..., splitWal=true, meta=false
> {noformat}
> There is no actual corruption in the file, so it never gets moved to 
> corrupted files.
> However, there's no accounting for these kind of procedures in the tracker as 
> far as I can tell (I didn't spend a lot of time looking at the code though) 
> so as a result we get 100s of proc wals that are stuck forever because of 
> some ancient file with these WALs; that causes master startup to take a long 
> time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to