Corrupted HBase MasterProcWAL

Matthew Blissett Thu, 21 Mar 2019 06:11:14 -0700

Hi all,

We have an HBase cluster running version 1.2.0-cdh5.12.0.


While investigating a slow startup time, I found that HBase spent 12 
minutes working through 14,146 logs in /hbase/MasterProcWALs/.

2019-03-20 15:04:07,687 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
Recover lease on dfs file 
hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010640.log
2019-03-20 15:04:07,695 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
Recovered lease, attempt=0 on 
file=hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010640.log 
after 8ms
2019-03-20 15:04:07,736 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
Recover lease on dfs file 
hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010642.log
2019-03-20 15:04:07,737 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
Recovered lease, attempt=0 on 
file=hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010642.log 
after 1ms
...
2019-03-20 15:16:13,961 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
Recover lease on dfs file 
hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000024782.log
2019-03-20 15:16:13,962 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
Recovered lease, attempt=0 on 
file=hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000024782.log 
after 1ms
2019-03-20 15:16:14,221 INFO 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Lease 
acquired for flushLogId: 24783
2019-03-20 15:16:16,327 ERROR 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: corrupted 
procedure: 
Procedure=org.apache.hadoop.hbase.master.procedure.DeleteTableProcedure 
(id=45, owner=trobertson, state=RUNNABLE, startTime=14161hrs, 32mins, 
47sec ago, lastUpdate=14161hrs, 32mins, 47sec ago)

I first ran through a copy of the logs with the ProcedureWALFormatReader 
(just in an IDE); this gives me matching log output, and tells me the 
procedure was an update to a long-since-deleted table.

Now I notice that the first two logs are numbers 10640 and 10642, so I 
appear to be missing 10641.  If I remove 10640 from my test directory, 
the ProcedureWALFormatReader runs through all the remaining logs without 
reporting a problem.  Is it safe to remove 
hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010640.log and 
(presumably) restart HBase and see the backlog of WALs processed and 
cleaned up?

Thanks for any advice,

Matt Blissett

Corrupted HBase MasterProcWAL

Reply via email to