[ https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707657#comment-16707657 ]
Sean Busbey commented on HBASE-21544: ------------------------------------- what does the contract for FileSystem.close say about data persistence? I thought recovered edits now go to the same FileSystem as the WAL? wouldn't that imply that hflush should be present? > WAL writer for recovered.edits file in WalSplitting should not require hflush > from filesystem > --------------------------------------------------------------------------------------------- > > Key: HBASE-21544 > URL: https://issues.apache.org/jira/browse/HBASE-21544 > Project: HBase > Issue Type: Bug > Components: wal > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > > Been talking through this with a bunch of folks. [~enis] brought me back from > the cliff of despair though. > Context: running HBase on top of a filesystem that doesn't have hflush for > hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, > ABFS). > When a RS fails and we have an SCP running for it, you'll see log splitting > get into an "infinite" loop where the master keeps resubmitting and the RS > which takes the action deterministically fails with the following: > {noformat} > 2018-11-26 20:59:18,415 ERROR > [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] > wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem > implementations relies on the ability to call hflush for proper operation > during component failures, but the current FileSystem does not support doing > so. Please check the config value of 'hbase.wal.dir' and ensure it points to > a FileSystem mount that has suitable capabilities for output streams. > 2018-11-26 20:59:18,415 WARN > [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] > wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default. > 2018-11-26 20:59:18,467 ERROR > [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: > Got while writing log entry to log > java.io.IOException: cannot get log writer > at > org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96) > at > org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61) > at > org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370) > at > org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584) > at > org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082) > at > org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052) > Caused by: > org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: > hflush > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99) > at > org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165) > at > org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77) > ... 10 more{noformat} > This is the sanity check added by HBASE-18784, failing on creating the writer > for the recovered.edits file. > The odd-ball here is that our recovered.edits writer is just a WAL writer > class. The WAL writer class thinks it always should have hflush support; > however, we don't _actually_ need that for writing out the recovered.edits > files. If {{close()}} on the recovered.edits file would fail, we're trash any > intermediate data in the filesystem and rerun the whole process. > It's my understanding that this check is over-bearing and we should not make > the check when the ProtobufLogWriter is being used for the recovered.edits > file. > [~zyork], [~busbey] fyi -- This message was sent by Atlassian JIRA (v7.6.3#76005)