[ https://issues.apache.org/jira/browse/HBASE-25932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355905#comment-17355905 ]
Hudson commented on HBASE-25932: -------------------------------- Results for branch branch-2 [build #267 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/267/]: (/) *{color:green}+1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/267/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/267/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/267/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/267/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > TestWALEntryStream#testCleanClosedWALs test is failing. > ------------------------------------------------------- > > Key: HBASE-25932 > URL: https://issues.apache.org/jira/browse/HBASE-25932 > Project: HBase > Issue Type: Bug > Components: metrics, Replication, wal > Affects Versions: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4 > Reporter: Rushabh Shah > Assignee: Bharath Vissapragada > Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4 > > Attachments: HBASE-25932-test-approach.patch > > > We are seeing the following test failure. > TestWALEntryStream#testCleanClosedWALs > This test was added in HBASE-25924. I don't think the test failure has > anything to do with the patch in HBASE-25924. > Before HBASE-25924, we were *not* monitoring _uncleanlyClosedWAL_ metric. In > all the branches, we were not parsing the wal trailer when we close the wal > reader inside ReplicationSourceWALReader thread. The root cause was when we > add active WAL to ReplicationSourceWALReader, we cache the file size when the > wal was being actively written and once the wal was closed and replicated and > removed from WALEntryStream, we did reset the ProtobufLogReader object but we > didn't update the length of the wal and that was causing EOF errors since it > can't find the WALTrailer with the stale wal file length. > The fix applied nicely to branch-1 since we use FSHlog implementation which > closes the WAL file sychronously. > But in branch-2 and master, we use _AsyncFSWAL_ implementation and the > closing of wal file is done asynchronously (as the name suggests). This is > causing the test to fail. Below is the test. > {code:java} > @Test > public void testCleanClosedWALs() throws Exception { > try (WALEntryStream entryStream = new WALEntryStream( > logQueue, CONF, 0, log, null, logQueue.getMetrics(), fakeWalGroupId)) { > assertEquals(0, logQueue.getMetrics().getUncleanlyClosedWALs()); > appendToLogAndSync(); > assertNotNull(entryStream.next()); > log.rollWriter(); =======> This does an asynchronous close of wal. > appendToLogAndSync(); > assertNotNull(entryStream.next()); > assertEquals(0, logQueue.getMetrics().getUncleanlyClosedWALs()); > } > } > {code} > In the above code, when we roll writer, we don't close the old wal file > immediately so the ReplicationReader thread is not able to get the updated > wal file size and that is throwing EOF errors. > If I add a sleep of few milliseconds (1 ms in my local env) between > rollWriter and appendToLogAndSync statement then the test passes but this is > *not* a proper fix since we are working around the race between > ReplicationSourceWALReaderThread and closing of WAL file. -- This message was sent by Atlassian Jira (v8.3.4#803005)