Sergey Shelukhin created HBASE-22168:
----------------------------------------

             Summary: proc WALs with non-corrupted-but-"corrupted" block WAL 
archiving forever
                 Key: HBASE-22168
                 URL: https://issues.apache.org/jira/browse/HBASE-22168
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin


I've reported the bug before where we get these messages when loading proc WAL
{noformat}
2019-04-04 14:43:00,424 ERROR [master/...:becomeActiveMaster] 
wal.WALProcedureTree: Missing stack id 43459, max stack id is 43460, root 
procedure is Procedure(pid=43645, ppid=-1, 
class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure)
{noformat}
resulting in 
{noformat}
2019-04-04 14:43:16,176 ERROR [...:17000:becomeActiveMaster] 
procedure2.ProcedureExecutor: Corrupt pid=43645, 
state=WAITING:SERVER_CRASH_FINISH, hasLock=false; ServerCrashProcedure 
server=..., splitWal=true, meta=false
{noformat}
There is no actual corruption in the file, so it never gets moved to corrupted 
files.
However, there's no accounting for these kind of procedures in the tracker as 
far as I can tell (I didn't spend a lot of time looking at the code though) so 
as a result we get 100s of proc wals that are stuck forever because of some 
ancient file with these WALs; that causes master startup to take a long time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to