Hi all, We have an HBase cluster running version 1.2.0-cdh5.12.0.
While investigating a slow startup time, I found that HBase spent 12 minutes working through 14,146 logs in /hbase/MasterProcWALs/. 2019-03-20 15:04:07,687 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recover lease on dfs file hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010640.log 2019-03-20 15:04:07,695 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovered lease, attempt=0 on file=hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010640.log after 8ms 2019-03-20 15:04:07,736 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recover lease on dfs file hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010642.log 2019-03-20 15:04:07,737 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovered lease, attempt=0 on file=hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010642.log after 1ms ... 2019-03-20 15:16:13,961 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recover lease on dfs file hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000024782.log 2019-03-20 15:16:13,962 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovered lease, attempt=0 on file=hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000024782.log after 1ms 2019-03-20 15:16:14,221 INFO org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Lease acquired for flushLogId: 24783 2019-03-20 15:16:16,327 ERROR org.apache.hadoop.hbase.procedure2.ProcedureExecutor: corrupted procedure: Procedure=org.apache.hadoop.hbase.master.procedure.DeleteTableProcedure (id=45, owner=trobertson, state=RUNNABLE, startTime=14161hrs, 32mins, 47sec ago, lastUpdate=14161hrs, 32mins, 47sec ago) I first ran through a copy of the logs with the ProcedureWALFormatReader (just in an IDE); this gives me matching log output, and tells me the procedure was an update to a long-since-deleted table. Now I notice that the first two logs are numbers 10640 and 10642, so I appear to be missing 10641. If I remove 10640 from my test directory, the ProcedureWALFormatReader runs through all the remaining logs without reporting a problem. Is it safe to remove hdfs://ha-nn/hbase/MasterProcWALs/state-00000000000000010640.log and (presumably) restart HBase and see the backlog of WALs processed and cleaned up? Thanks for any advice, Matt Blissett