[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707696#comment-16707696
 ] 

Sean Busbey commented on HBASE-21544:
-------------------------------------

bq. FSDataOutputStream (assuming that's what you meant by FileSystem.close()) 
doesn't say anything in terms of Javadoc, but the implementation is such that 
close() makes the same guarantees as hflush().

Does it only do that if the underlying FileSystem supports hflush?

{quote}
bq. I thought recovered edits now go to the same FileSystem as the WAL? 
wouldn't that imply that hflush should be present?

Ah, this didn't land on 2.0.x. Yes, that would have precluded the need for such 
a change.

Semantics are that it would be good to make sure that we aren't over-requiring 
from our filesystem, but you are correct in that this is less of a concern in 
newer versions since the durability required of the FS by WALs is more than 
that for recovered.edits 
{quote}

Sure. I just worry about too many configuration knobs. Could we just backport 
the fix for HBASE-20734 to branch-2.0 and call it a day?

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21544
>                 URL: https://issues.apache.org/jira/browse/HBASE-21544
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
>         at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
>         at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
>         at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
>         at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
>         at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
>         ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to