[
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707692#comment-16707692
]
Josh Elser commented on HBASE-21544:
------------------------------------
{quote}Edit: I see you're talking about WASB. Does that support hflush?
{quote}
Yeah, all of their stuff does (if you have it configured a certain way, at
least). This was observed when recovered.edits were going to a part of the
FileSystem which didnt' support hflush.
{quote}I think HBASE-20734 should fix this case since HDFS will have hflush
capability.
I thought recovered edits now go to the same FileSystem as the WAL? wouldn't
that imply that hflush should be present?
{quote}
Ah, this didn't land on 2.0.x. Yes, that would have precluded the need for such
a change.
Semantics are that it would be good to make sure that we aren't over-requiring
from our filesystem, but you are correct in that this is less of a concern in
newer versions since the durability required of the FS by WALs is more than
that for recovered.edits :)
{quote}what does the contract for FileSystem.close say about data persistence?
{quote}
FSDataOutputStream (assuming that's what you meant by {{FileSystem.close()}})
doesn't say anything in terms of Javadoc, but the implementation is such that
{{close()}} makes the same guarantees as {{hflush()}}.
> WAL writer for recovered.edits file in WalSplitting should not require hflush
> from filesystem
> ---------------------------------------------------------------------------------------------
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
> Issue Type: Bug
> Components: wal
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB,
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting
> get into an "infinite" loop where the master keeps resubmitting and the RS
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2]
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem
> implementations relies on the ability to call hflush for proper operation
> during component failures, but the current FileSystem does not support doing
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2]
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter:
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by:
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException:
> hflush
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer
> class. The WAL writer class thinks it always should have hflush support;
> however, we don't _actually_ need that for writing out the recovered.edits
> files. If {{close()}} on the recovered.edits file would fail, we're trash any
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make
> the check when the ProtobufLogWriter is being used for the recovered.edits
> file.
> [~zyork], [~busbey] fyi
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)