[ https://issues.apache.org/jira/browse/HBASE-25053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Stack resolved HBASE-25053. ----------------------------------- Hadoop Flags: Reviewed Resolution: Fixed Merged to branch-2 and master. Thanks for patch [~niuyulin] . Thanks for reviews [~zhangduo] and [~vjasani] > WAL replay should ignore 0-length files > --------------------------------------- > > Key: HBASE-25053 > URL: https://issues.apache.org/jira/browse/HBASE-25053 > Project: HBase > Issue Type: Bug > Components: master, regionserver > Affects Versions: 2.3.1 > Reporter: Nick Dimiduk > Assignee: niuyulin > Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > I overdrove a small testing cluster, filling HDFS. After cleaning up data to > bring HBase back up, I noticed all masters -refused to start- abort. Logs > complain of seeking past EOF. Indeed the last wal file name logged is a > 0-length file. WAL replay should gracefully skip and clean up such an empty > file. > {noformat} > 2020-09-16 19:51:30,297 ERROR org.apache.hadoop.hbase.master.HMaster: Failed > to become active master > java.io.EOFException: Cannot seek after EOF > at > org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1448) > at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:66) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initInternal(ProtobufLogReader.java:211) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initReader(ProtobufLogReader.java:173) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:64) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:323) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:305) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:429) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4859) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4765) > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1014) > at > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:956) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7496) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7454) > at > org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:269) > at > org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:309) > at > org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:949) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240) > at > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622) > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)