[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

Duo Zhang (JIRA) Mon, 03 Dec 2018 16:30:08 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708012#comment-16708012
 ]


Duo Zhang commented on HBASE-21544:
-----------------------------------

Agree that for recovered edits we do not need the FileSystem to support hflush. 
Even if now we think HBASE-20734 is a 'better' solution since S3 is a bit 
slow(?), but in a long term I do think we should remove the hflush check for 
writing recovered edits.

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21544
>                 URL: https://issues.apache.org/jira/browse/HBASE-21544
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>             Fix For: 2.0.4
>
>         Attachments: HBASE-20734.001.branch-2.0.patch
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
>         at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
>         at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
>         at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
>         at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
>         ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

Reply via email to