[ 
https://issues.apache.org/jira/browse/HDDS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849903#comment-17849903
 ] 

Ivan Andika edited comment on HDDS-10788 at 5/28/24 8:29 AM:
-------------------------------------------------------------

[~ibrusentsev] Could you help take a look? I think this is related to HDDS-9826.

Not sure why this assertion is put inside of testWatchForCommitForRetryfailure? 

 
{code:java}
      // client should not attempt to watch with
      // MAJORITY_COMMITTED replication level, except the grpc IO issue
      if (!logCapturer.getOutput().contains("Connection refused")) {
        assertThat(e.getMessage()).doesNotContain("Watch-MAJORITY_COMMITTED");
      } {code}
 

I saw this following SCM close pipeline call:

 
{code:java}
// emulate closing pipeline when SCM detects DEAD datanodes
cluster.getStorageContainerManager()
    .getPipelineManager().closePipeline(pipeline, false); {code}
Was this added because the test wanted to ensure GroupMismatchException is 
thrown for the watch request? I don't think this will trigger pipeline closure 
(Ratis group remove) immediately (since we need to wait until the pipeline 
scrubber detect the pipeline close and send the pipeline close command). Might 
need to use the TestHelper.waitForPipelineClose to ensure that the Ratis group 
is already removed from all the DNs (e.g. ClosePipelineCommandHandler received 
the close pipeline command from SCM).

 

I saw that the test was originally set to 
testWatchForCommitForGroupMismatchException, but was moved to 
testWatchForCommitForRetryfailure. I don't think this is intended since 
HDDS-9826 only dealt with GroupMismatchException.


was (Author: JIRAUSER298977):
[~ibrusentsev] Could you help take a look? I think this is related to HDDS-9826.

Not sure why this assertion is put inside of testWatchForCommitForRetryfailure? 

 
{code:java}
      // client should not attempt to watch with
      // MAJORITY_COMMITTED replication level, except the grpc IO issue
      if (!logCapturer.getOutput().contains("Connection refused")) {
        assertThat(e.getMessage()).doesNotContain("Watch-MAJORITY_COMMITTED");
      } {code}
 

I saw this following SCM close pipeline call:

 
{code:java}
// emulate closing pipeline when SCM detects DEAD datanodes
cluster.getStorageContainerManager()
    .getPipelineManager().closePipeline(pipeline, false); {code}
Was this added because the test wanted to ensure GroupMismatchException is 
thrown for the watch request? I don't think this will trigger pipeline closure 
(Ratis group remove) immediately (e.g. need to wait until the pipeline scrubber 
detect the pipeline close and send the pipeline close command). Might need to 
use the TestHelper.waitForPipelineClose to ensure that the Ratis group is 
already removed from all the DNs (e.g. ClosePipelineCommandHandler received the 
close pipeline command from SCM).

 

I saw that the test was originally set to 
testWatchForCommitForGroupMismatchException, but was moved to 
testWatchForCommitForRetryfailure. I don't think this is intended since 
HDDS-9826 only dealt with GroupMismatchException.

> Intermittent failure in testWatchForCommitForRetryfailure
> ---------------------------------------------------------
>
>                 Key: HDDS-10788
>                 URL: https://issues.apache.org/jira/browse/HDDS-10788
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Attila Doroszlai
>            Priority: Major
>         Attachments: 
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit-output.txt, 
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.txt
>
>
> {code}
> Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 134.993 s <<< 
> FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure
>   Time elapsed: 37.042 s  <<< FAILURE!
> java.lang.AssertionError: 
> Expecting:
>  <"org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed 
> RaftClientRequest:client-35B425B0E6BC->7ca6ef5b-8396-458a-a334-21f1f3211157@group-3A5CD4773BBD,
>  cid=51, seq=null, Watch-MAJORITY_COMMITTED(95), null for 3 attempts with 
> RequestTypeDependentRetryPolicy{...}">
> not to contain:
>  <"Watch-MAJORITY_COMMITTED">
>       at 
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:295)
> {code}
> {code:title=https://github.com/apache/ozone/blob/a658802d628271efa82824dc3c316f8eebfc75d3/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestWatchForCommit.java#L292-L296}
>       // client should not attempt to watch with
>       // MAJORITY_COMMITTED replication level, except the grpc IO issue
>       if (!logCapturer.getOutput().contains("Connection refused")) {
>         assertThat(e.getMessage()).doesNotContain("Watch-MAJORITY_COMMITTED");
>       }
> {code}
> {code:title=log}
> [main] WARN  scm.XceiverClientRatis 
> (XceiverClientRatis.java:watchForCommit(284)) - 3 way commit failed on 
> pipeline ...
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.protocol.exceptions.NotReplicatedException: Request with 
> call Id 50 and log index 95 is not yet replicated to ALL_COMMITTED
>       at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>       at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>       at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:279)
>       at 
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.lambda$testWatchForCommitForRetryfailure$0(TestWatchForCommit.java:286)
>       at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
>       at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
>       at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3115)
>       at 
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:285)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to