[ https://issues.apache.org/jira/browse/HBASE-22168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HBASE-22168: ------------------------------------- Affects Version/s: 3.0.0 > proc WALs with non-corrupted-but-"corrupted" procedures block WAL archiving > forever > ----------------------------------------------------------------------------------- > > Key: HBASE-22168 > URL: https://issues.apache.org/jira/browse/HBASE-22168 > Project: HBase > Issue Type: Bug > Affects Versions: 3.0.0 > Reporter: Sergey Shelukhin > Priority: Critical > > I've reported the bug before where we get these messages when loading proc WAL > {noformat} > 2019-04-04 14:43:00,424 ERROR [master/...:becomeActiveMaster] > wal.WALProcedureTree: Missing stack id 43459, max stack id is 43460, root > procedure is Procedure(pid=43645, ppid=-1, > class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure) > {noformat} > resulting in > {noformat} > 2019-04-04 14:43:16,176 ERROR [...:17000:becomeActiveMaster] > procedure2.ProcedureExecutor: Corrupt pid=43645, > state=WAITING:SERVER_CRASH_FINISH, hasLock=false; ServerCrashProcedure > server=..., splitWal=true, meta=false > {noformat} > There is no actual corruption in the file, so it never gets moved to > corrupted files. > However, there's no accounting for these kind of procedures in the tracker as > far as I can tell (I didn't spend a lot of time looking at the code though) > so as a result we get 100s of proc wals that are stuck forever because of > some ancient file with these WALs; that causes master startup to take a long > time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)