[ 
https://issues.apache.org/jira/browse/HBASE-21548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-21548:
-------------------------------
    Description: 
We ran into a problem running HBase on top of Azure filesystems as described in 
HBASE-21544

The quick solution was to backport HBASE-20734 to branch-2.0 to solve this 
issue. However, it is incorrect for HBase to have the recovered.edits writer 
asserting more stringent requirements than it actually needs (does not need 
hflush).

This is to track fixing up the writers such that we are not requiring more than 
we actually need.

  was:
Been talking through this with a bunch of folks. [~enis] brought me back from 
the cliff of despair though.

Context: running HBase on top of a filesystem that doesn't have hflush for 
hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
ABFS).

When a RS fails and we have an SCP running for it, you'll see log splitting get 
into an "infinite" loop where the master keeps resubmitting and the RS which 
takes the action deterministically fails with the following:
{noformat}
2018-11-26 20:59:18,415 ERROR 
[RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
implementations relies on the ability to call hflush for proper operation 
during component failures, but the current FileSystem does not support doing 
so. Please check the config value of 'hbase.wal.dir' and ensure it points to a 
FileSystem mount that has suitable capabilities for output streams.
2018-11-26 20:59:18,415 WARN  
[RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
2018-11-26 20:59:18,467 ERROR 
[RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
Got while writing log entry to log
java.io.IOException: cannot get log writer
        at 
org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
        at 
org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
        at 
org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
        at 
org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
        at 
org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
        at 
org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
        at 
org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
        at 
org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
        at 
org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
        at 
org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
        at 
org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
Caused by: 
org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
hflush
        at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
        at 
org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
        ... 10 more{noformat}
This is the sanity check added by HBASE-18784, failing on creating the writer 
for the recovered.edits file.

The odd-ball here is that our recovered.edits writer is just a WAL writer 
class. The WAL writer class thinks it always should have hflush support; 
however, we don't _actually_ need that for writing out the recovered.edits 
files. If {{close()}} on the recovered.edits file would fail, we're trash any 
intermediate data in the filesystem and rerun the whole process.

It's my understanding that this check is over-bearing and we should not make 
the check when the ProtobufLogWriter is being used for the recovered.edits file.

[~zyork], [~busbey] fyi


> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21548
>                 URL: https://issues.apache.org/jira/browse/HBASE-21548
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>             Fix For: 2.2.0
>
>
> We ran into a problem running HBase on top of Azure filesystems as described 
> in HBASE-21544
> The quick solution was to backport HBASE-20734 to branch-2.0 to solve this 
> issue. However, it is incorrect for HBase to have the recovered.edits writer 
> asserting more stringent requirements than it actually needs (does not need 
> hflush).
> This is to track fixing up the writers such that we are not requiring more 
> than we actually need.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to