[
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708012#comment-16708012
]
Duo Zhang commented on HBASE-21544:
-----------------------------------
Agree that for recovered edits we do not need the FileSystem to support hflush.
Even if now we think HBASE-20734 is a 'better' solution since S3 is a bit
slow(?), but in a long term I do think we should remove the hflush check for
writing recovered edits.
> WAL writer for recovered.edits file in WalSplitting should not require hflush
> from filesystem
> ---------------------------------------------------------------------------------------------
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
> Issue Type: Bug
> Components: wal
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Major
> Fix For: 2.0.4
>
> Attachments: HBASE-20734.001.branch-2.0.patch
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB,
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting
> get into an "infinite" loop where the master keeps resubmitting and the RS
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2]
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem
> implementations relies on the ability to call hflush for proper operation
> during component failures, but the current FileSystem does not support doing
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2]
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter:
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by:
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException:
> hflush
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer
> class. The WAL writer class thinks it always should have hflush support;
> however, we don't _actually_ need that for writing out the recovered.edits
> files. If {{close()}} on the recovered.edits file would fail, we're trash any
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make
> the check when the ProtobufLogWriter is being used for the recovered.edits
> file.
> [~zyork], [~busbey] fyi
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)