[jira] [Commented] (HBASE-27231) FSHLog should retry writing WAL entries when syncs to HDFS failed.

Hudson (Jira) Fri, 14 Jul 2023 12:41:04 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-27231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743267#comment-17743267
 ]


Hudson commented on HBASE-27231:
--------------------------------

Results for branch branch-3
        [build #18 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/18/]: 
(/) *{color:green}+1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/18/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/18/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/18/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> FSHLog should retry writing WAL entries when syncs to HDFS failed.
> ------------------------------------------------------------------
>
>                 Key: HBASE-27231
>                 URL: https://issues.apache.org/jira/browse/HBASE-27231
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>    Affects Versions: 3.0.0-alpha-4
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>             Fix For: 3.0.0-beta-1
>
>
> Just as HBASE-27223 said, basically, if the {{WAL}} write to HDFS fails, we 
> do not know whether the data has been persistent or not. The implementation 
> for {{AsyncFSWAL}}, is to open a new writer and try to write the WAL entries 
> again, and then adding logic in WAL split and replay to deal with duplicate 
> entries. But for {{FSHLog}}, it does not have the same logic with 
> {{AsyncFSWAL}}, when {{ProtobufLogWriter.append}} and 
> {{ProtobufLogWriter.sync}} failed, {{FSHLog.sync}} immediately throws the 
> exception thrown by {{ProtobufLogWriter.append}} and 
> {{ProtobufLogWriter.sync}} , we should implement the same retry logic as 
> {{AsyncFSWAL}}, so {{WAL.sync}} could only throw  {{TimeoutIOException}} and 
> we could uniformly abort the RegionServer when  {{WAL.sync}} failed.
> The basic idea is because both {{FSHLog.RingBufferEventHandler}} and 
> {{AsyncFSWAL.consumeExecutor}} are single-thread,  we could reuse the logic 
> in {{AsyncWAL}} and move the most code in {{AsyncWAL}} upward to 
> {{AbstractFSWAL}} , and just adapting the {{SyncRunner}} in {{FSHLog}} to the 
> logic in {{AsyncWriter.sync}}. Once we do that, most logic in {{AsyncWAL}} 
> and {{FSHLog}} are unified, just how to sync the {{writer}} is different.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-27231) FSHLog should retry writing WAL entries when syncs to HDFS failed.

Reply via email to