[
https://issues.apache.org/jira/browse/HBASE-27231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chenglei updated HBASE-27231:
-----------------------------
Description:
Just as HBASE-27223 said, basically, if the {{WAL}} write to HDFS fails, we do
not know whether the data has been persistent or not. The implementation for
{{AsyncFSWAL}}, is to open a new writer and try to write the WAL entries again,
and then adding logic in WAL split and replay to deal with duplicate entries.
But for {{FSHLog}}, it does not have the same logic with {{AsyncFSWAL}}, when
{{ProtobufLogWriter.append}} and {{ProtobufLogWriter.sync}} failed,
{{FSHLog.sync}} immediately throws the exception thrown by
{{ProtobufLogWriter.append}} and {{ProtobufLogWriter.sync}} , we should
implement the same retry logic as {{AsyncFSWAL}}, so {{WAL.sync}} could only
throw {{TimeoutIOException}} and we could uniformly abort the RegionServer
when {{WAL.sync}} failed.
The basic idea is
was:Just as HBASE-27223 said, basically, if the {{WAL}} write to HDFS fails,
we do not know whether the data has been persistent or not. The implementation
for {{AsyncFSWAL}}, is to open a new writer and try to write the WAL entries
again, and then adding logic in WAL split and replay to deal with duplicate
entries. But for {{FSHLog}}, it does not have the same logic with
{{AsyncFSWAL}}, when {{ProtobufLogWriter.append}} and
{{ProtobufLogWriter.sync}} failed, {{FSHLog.sync}} immediately throws the
exception thrown by {{ProtobufLogWriter.append}} and {{ProtobufLogWriter.sync}}
, we should implement the same retry logic as {{AsyncFSWAL}}, so {{WAL.sync}}
could only throw {{TimeoutIOException}} and we could uniformly abort the
RegionServer when {{WAL.sync}} failed.
> FSHLog should retry writing WAL entries when syncs to HDFS failed.
> ------------------------------------------------------------------
>
> Key: HBASE-27231
> URL: https://issues.apache.org/jira/browse/HBASE-27231
> Project: HBase
> Issue Type: Improvement
> Components: wal
> Affects Versions: 3.0.0-alpha-4
> Reporter: chenglei
> Assignee: chenglei
> Priority: Major
>
> Just as HBASE-27223 said, basically, if the {{WAL}} write to HDFS fails, we
> do not know whether the data has been persistent or not. The implementation
> for {{AsyncFSWAL}}, is to open a new writer and try to write the WAL entries
> again, and then adding logic in WAL split and replay to deal with duplicate
> entries. But for {{FSHLog}}, it does not have the same logic with
> {{AsyncFSWAL}}, when {{ProtobufLogWriter.append}} and
> {{ProtobufLogWriter.sync}} failed, {{FSHLog.sync}} immediately throws the
> exception thrown by {{ProtobufLogWriter.append}} and
> {{ProtobufLogWriter.sync}} , we should implement the same retry logic as
> {{AsyncFSWAL}}, so {{WAL.sync}} could only throw {{TimeoutIOException}} and
> we could uniformly abort the RegionServer when {{WAL.sync}} failed.
> The basic idea is
--
This message was sent by Atlassian Jira
(v8.20.10#820010)