[
https://issues.apache.org/jira/browse/HBASE-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865219#comment-17865219
]
Hudson commented on HBASE-28665:
--------------------------------
Results for branch branch-2.5
[build #565 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/565/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/565/General_20Nightly_20Build_20Report/]
(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/565/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]
(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/565/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/565/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/565/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> WALs not marked closed when there are errors in closing WALs
> ------------------------------------------------------------
>
> Key: HBASE-28665
> URL: https://issues.apache.org/jira/browse/HBASE-28665
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 2.5.8
> Reporter: Kiran Kumar Maturi
> Assignee: Kiran Kumar Maturi
> Priority: Minor
> Labels: pull-request-available
> Fix For: 2.7.0, 2.6.1, 2.5.10
>
>
> In our production clusters we have observed that when WAL close fails It
> causes the the oldWAL files not marked as close and not letting them cleaned.
> When a WAL close fails in closeWriter it increments the error count.
> {code:java}
> Span span = Span.current();
> try {
> span.addEvent("closing writer");
> writer.close();
> span.addEvent("writer closed");
> } catch (IOException ioe) {
> int errors = closeErrorCount.incrementAndGet();
> boolean hasUnflushedEntries = isUnflushedEntries();
> if (syncCloseCall && (hasUnflushedEntries || (errors >
> this.closeErrorsTolerated))) {
> LOG.error("Close of WAL " + path + " failed. Cause=\"" +
> ioe.getMessage() + "\", errors="
> + errors + ", hasUnflushedEntries=" + hasUnflushedEntries);
> throw ioe;
> }
> LOG.warn("Riding over failed WAL close of " + path
> + "; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK",
> ioe);
> }
> {code}
> When there are errors in closing WAL only twice doReplaceWALWriter enters
> this code block
> {code:java}
> if (isUnflushedEntries() || closeErrorCount.get() >=
> this.closeErrorsTolerated) {
> try {
> closeWriter(this.writer, oldPath, true);
> } finally {
> inflightWALClosures.remove(oldPath.getName());
> }
> }
> {code}
> as we don't mark them closed here like we do it here
>
> {code:java}
> Writer localWriter = this.writer;
> closeExecutor.execute(() -> {
> try {
> closeWriter(localWriter, oldPath, false);
> } catch (IOException e) {
> LOG.warn("close old writer failed", e);
> } finally {
> // call this even if the above close fails, as there is no
> other chance we can set
> // closed to true, it will not cause big problems.
> {color:red} markClosedAndClean(oldPath);{color}
> inflightWALClosures.remove(oldPath.getName());
> }
> });
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)