[
https://issues.apache.org/jira/browse/HDDS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849903#comment-17849903
]
Ivan Andika edited comment on HDDS-10788 at 5/28/24 8:29 AM:
-------------------------------------------------------------
[~ibrusentsev] Could you help take a look? I think this is related to HDDS-9826.
Not sure why this assertion is put inside of testWatchForCommitForRetryfailure?
{code:java}
// client should not attempt to watch with
// MAJORITY_COMMITTED replication level, except the grpc IO issue
if (!logCapturer.getOutput().contains("Connection refused")) {
assertThat(e.getMessage()).doesNotContain("Watch-MAJORITY_COMMITTED");
} {code}
I saw this following SCM close pipeline call:
{code:java}
// emulate closing pipeline when SCM detects DEAD datanodes
cluster.getStorageContainerManager()
.getPipelineManager().closePipeline(pipeline, false); {code}
Was this added because the test wanted to ensure GroupMismatchException is
thrown for the watch request? I don't think this will trigger pipeline closure
(Ratis group remove) immediately (since we need to wait until the pipeline
scrubber detect the pipeline close and send the pipeline close command). Might
need to use the TestHelper.waitForPipelineClose to ensure that the Ratis group
is already removed from all the DNs (e.g. ClosePipelineCommandHandler received
the close pipeline command from SCM).
I saw that the test was originally set to
testWatchForCommitForGroupMismatchException, but was moved to
testWatchForCommitForRetryfailure. I don't think this is intended since
HDDS-9826 only dealt with GroupMismatchException.
was (Author: JIRAUSER298977):
[~ibrusentsev] Could you help take a look? I think this is related to HDDS-9826.
Not sure why this assertion is put inside of testWatchForCommitForRetryfailure?
{code:java}
// client should not attempt to watch with
// MAJORITY_COMMITTED replication level, except the grpc IO issue
if (!logCapturer.getOutput().contains("Connection refused")) {
assertThat(e.getMessage()).doesNotContain("Watch-MAJORITY_COMMITTED");
} {code}
I saw this following SCM close pipeline call:
{code:java}
// emulate closing pipeline when SCM detects DEAD datanodes
cluster.getStorageContainerManager()
.getPipelineManager().closePipeline(pipeline, false); {code}
Was this added because the test wanted to ensure GroupMismatchException is
thrown for the watch request? I don't think this will trigger pipeline closure
(Ratis group remove) immediately (e.g. need to wait until the pipeline scrubber
detect the pipeline close and send the pipeline close command). Might need to
use the TestHelper.waitForPipelineClose to ensure that the Ratis group is
already removed from all the DNs (e.g. ClosePipelineCommandHandler received the
close pipeline command from SCM).
I saw that the test was originally set to
testWatchForCommitForGroupMismatchException, but was moved to
testWatchForCommitForRetryfailure. I don't think this is intended since
HDDS-9826 only dealt with GroupMismatchException.
> Intermittent failure in testWatchForCommitForRetryfailure
> ---------------------------------------------------------
>
> Key: HDDS-10788
> URL: https://issues.apache.org/jira/browse/HDDS-10788
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: test
> Reporter: Attila Doroszlai
> Priority: Major
> Attachments:
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit-output.txt,
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.txt
>
>
> {code}
> Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 134.993 s <<<
> FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure
> Time elapsed: 37.042 s <<< FAILURE!
> java.lang.AssertionError:
> Expecting:
> <"org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed
> RaftClientRequest:client-35B425B0E6BC->7ca6ef5b-8396-458a-a334-21f1f3211157@group-3A5CD4773BBD,
> cid=51, seq=null, Watch-MAJORITY_COMMITTED(95), null for 3 attempts with
> RequestTypeDependentRetryPolicy{...}">
> not to contain:
> <"Watch-MAJORITY_COMMITTED">
> at
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:295)
> {code}
> {code:title=https://github.com/apache/ozone/blob/a658802d628271efa82824dc3c316f8eebfc75d3/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestWatchForCommit.java#L292-L296}
> // client should not attempt to watch with
> // MAJORITY_COMMITTED replication level, except the grpc IO issue
> if (!logCapturer.getOutput().contains("Connection refused")) {
> assertThat(e.getMessage()).doesNotContain("Watch-MAJORITY_COMMITTED");
> }
> {code}
> {code:title=log}
> [main] WARN scm.XceiverClientRatis
> (XceiverClientRatis.java:watchForCommit(284)) - 3 way commit failed on
> pipeline ...
> java.util.concurrent.ExecutionException:
> org.apache.ratis.protocol.exceptions.NotReplicatedException: Request with
> call Id 50 and log index 95 is not yet replicated to ALL_COMMITTED
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> at
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:279)
> at
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.lambda$testWatchForCommitForRetryfailure$0(TestWatchForCommit.java:286)
> at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
> at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
> at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3115)
> at
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:285)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]