[jira] [Updated] (HBASE-27231) FSHLog should retry writing WAL entries when syncs to HDFS failed.

chenglei (Jira) Tue, 23 Aug 2022 03:09:33 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-27231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


chenglei updated HBASE-27231:
-----------------------------
    Description: 
Just as HBASE-27223 said, basically, if the {{WAL}} write to HDFS fails, we do 
not know whether the data has been persistent or not. The implementation for 
{{AsyncFSWAL}}, is to open a new writer and try to write the WAL entries again, 
and then adding logic in WAL split and replay to deal with duplicate entries. 
But for {{FSHLog}}, it does not have the same logic with {{AsyncFSWAL}}, when 
{{ProtobufLogWriter.append}} and {{ProtobufLogWriter.sync}} failed, 
{{FSHLog.sync}} immediately throws the exception thrown by 
{{ProtobufLogWriter.append}} and {{ProtobufLogWriter.sync}} , we should 
implement the same retry logic as {{AsyncFSWAL}}, so {{WAL.sync}} could only 
throw  {{TimeoutIOException}} and we could uniformly abort the RegionServer 
when  {{WAL.sync}} failed.

The basic idea is  

  was:Just as HBASE-27223 said, basically, if the {{WAL}} write to HDFS fails, 
we do not know whether the data has been persistent or not. The implementation 
for {{AsyncFSWAL}}, is to open a new writer and try to write the WAL entries 
again, and then adding logic in WAL split and replay to deal with duplicate 
entries. But for {{FSHLog}}, it does not have the same logic with 
{{AsyncFSWAL}}, when {{ProtobufLogWriter.append}} and 
{{ProtobufLogWriter.sync}} failed, {{FSHLog.sync}} immediately throws the 
exception thrown by {{ProtobufLogWriter.append}} and {{ProtobufLogWriter.sync}} 
, we should implement the same retry logic as {{AsyncFSWAL}}, so {{WAL.sync}} 
could only throw  {{TimeoutIOException}} and we could uniformly abort the 
RegionServer when  {{WAL.sync}} failed.


> FSHLog should retry writing WAL entries when syncs to HDFS failed.
> ------------------------------------------------------------------
>
>                 Key: HBASE-27231
>                 URL: https://issues.apache.org/jira/browse/HBASE-27231
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>    Affects Versions: 3.0.0-alpha-4
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>
> Just as HBASE-27223 said, basically, if the {{WAL}} write to HDFS fails, we 
> do not know whether the data has been persistent or not. The implementation 
> for {{AsyncFSWAL}}, is to open a new writer and try to write the WAL entries 
> again, and then adding logic in WAL split and replay to deal with duplicate 
> entries. But for {{FSHLog}}, it does not have the same logic with 
> {{AsyncFSWAL}}, when {{ProtobufLogWriter.append}} and 
> {{ProtobufLogWriter.sync}} failed, {{FSHLog.sync}} immediately throws the 
> exception thrown by {{ProtobufLogWriter.append}} and 
> {{ProtobufLogWriter.sync}} , we should implement the same retry logic as 
> {{AsyncFSWAL}}, so {{WAL.sync}} could only throw  {{TimeoutIOException}} and 
> we could uniformly abort the RegionServer when  {{WAL.sync}} failed.
> The basic idea is  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-27231) FSHLog should retry writing WAL entries when syncs to HDFS failed.

Reply via email to