[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

Josh Elser (JIRA) Mon, 03 Dec 2018 11:38:38 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707692#comment-16707692
 ]


Josh Elser commented on HBASE-21544:
------------------------------------

{quote}Edit: I see you're talking about WASB. Does that support hflush?
{quote}
Yeah, all of their stuff does (if you have it configured a certain way, at 
least). This was observed when recovered.edits were going to a part of the 
FileSystem which didnt' support hflush.
{quote}I think HBASE-20734 should fix this case since HDFS will have hflush 
capability.

I thought recovered edits now go to the same FileSystem as the WAL? wouldn't 
that imply that hflush should be present?
{quote}
Ah, this didn't land on 2.0.x. Yes, that would have precluded the need for such 
a change.

Semantics are that it would be good to make sure that we aren't over-requiring 
from our filesystem, but you are correct in that this is less of a concern in 
newer versions since the durability required of the FS by WALs is more than 
that for recovered.edits :)
{quote}what does the contract for FileSystem.close say about data persistence?
{quote}
FSDataOutputStream (assuming that's what you meant by {{FileSystem.close()}}) 
doesn't say anything in terms of Javadoc, but the implementation is such that 
{{close()}} makes the same guarantees as {{hflush()}}.

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21544
>                 URL: https://issues.apache.org/jira/browse/HBASE-21544
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
>         at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
>         at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
>         at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
>         at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
>         ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

Reply via email to