[
https://issues.apache.org/jira/browse/HDDS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853600#comment-17853600
]
Raju Balpande edited comment on HDDS-10788 at 6/10/24 9:55 AM:
---------------------------------------------------------------
I see the failure ratio as 48/1000, i.e. 4.8% in
[https://github.com/raju-balpande/apache_ozone/actions/runs/9398762434] with
following error,
{noformat}
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
Error: Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
46.541 s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
Error:
org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure
Time elapsed: 46.509 s <<< FAILURE!
java.lang.AssertionError:
Expecting actual:
"org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed
RaftClientRequest:client-54F8BAFFDCC6->caf3bd91-ec62-492c-910c-c9010abb2965@group-6E03D52A6FF6,
cid=16, seq=null, Watch-MAJORITY_COMMITTED(36), null for 3 attempts with
RequestTypeDependentRetryPolicy{WRITE->ExceptionDependentRetry(maxAttempts=2147483647;
defaultPolicy=MultipleLinearRandomRetry[5x5s, 5x10s, 5x15s, 5x20s, 5x25s,
10x60s];
map={org.apache.ratis.protocol.exceptions.GroupMismatchException->NoRetry,
org.apache.ratis.protocol.exceptions.NotReplicatedException->NoRetry,
org.apache.ratis.protocol.exceptions.ResourceUnavailableException->org.apache.ratis.retry.ExponentialBackoffRetry@64f32a1f,
org.apache.ratis.protocol.exceptions.StateMachineException->NoRetry,
org.apache.ratis.protocol.exceptions.TimeoutIOException->org.apache.ratis.retry.ExponentialBackoffRetry@64f32a1f}),
WATCH->ExceptionDependentRetry(maxAttempts=2147483647;
defaultPolicy=MultipleLinearRandomRetry[5x5s, 5x10s, 5x15s, 5x20s, 5x25s,
10x60s];
map={org.apache.ratis.protocol.exceptions.GroupMismatchException->NoRetry,
org.apache.ratis.protocol.exceptions.NotReplicatedException->NoRetry,
org.apache.ratis.protocol.exceptions.ResourceUnavailableException->org.apache.ratis.retry.ExponentialBackoffRetry@64f32a1f,
org.apache.ratis.protocol.exceptions.StateMachineException->NoRetry,
org.apache.ratis.protocol.exceptions.TimeoutIOException->NoRetry})}"
not to contain:
"Watch-MAJORITY_COMMITTED"
at
org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:296)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at java.util.ArrayList.forEach(ArrayList.java:1259){noformat}
With this fix it 100% working as in
[https://github.com/raju-balpande/apache_ozone/actions/runs/9427808268]
was (Author: JIRAUSER296391):
I see the failure ratio as 48/1000, i.e. 4.8% in
[https://github.com/raju-balpande/apache_ozone/actions/runs/9398762434] with
following error,
{noformat}
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
Error: Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
46.541 s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
Error:
org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure
Time elapsed: 46.509 s <<< FAILURE!
java.lang.AssertionError:
Expecting actual:
"org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed
RaftClientRequest:client-54F8BAFFDCC6->caf3bd91-ec62-492c-910c-c9010abb2965@group-6E03D52A6FF6,
cid=16, seq=null, Watch-MAJORITY_COMMITTED(36), null for 3 attempts with
RequestTypeDependentRetryPolicy{WRITE->ExceptionDependentRetry(maxAttempts=2147483647;
defaultPolicy=MultipleLinearRandomRetry[5x5s, 5x10s, 5x15s, 5x20s, 5x25s,
10x60s];
map={org.apache.ratis.protocol.exceptions.GroupMismatchException->NoRetry,
org.apache.ratis.protocol.exceptions.NotReplicatedException->NoRetry,
org.apache.ratis.protocol.exceptions.ResourceUnavailableException->org.apache.ratis.retry.ExponentialBackoffRetry@64f32a1f,
org.apache.ratis.protocol.exceptions.StateMachineException->NoRetry,
org.apache.ratis.protocol.exceptions.TimeoutIOException->org.apache.ratis.retry.ExponentialBackoffRetry@64f32a1f}),
WATCH->ExceptionDependentRetry(maxAttempts=2147483647;
defaultPolicy=MultipleLinearRandomRetry[5x5s, 5x10s, 5x15s, 5x20s, 5x25s,
10x60s];
map={org.apache.ratis.protocol.exceptions.GroupMismatchException->NoRetry,
org.apache.ratis.protocol.exceptions.NotReplicatedException->NoRetry,
org.apache.ratis.protocol.exceptions.ResourceUnavailableException->org.apache.ratis.retry.ExponentialBackoffRetry@64f32a1f,
org.apache.ratis.protocol.exceptions.StateMachineException->NoRetry,
org.apache.ratis.protocol.exceptions.TimeoutIOException->NoRetry})}"
not to contain:
"Watch-MAJORITY_COMMITTED"
at
org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:296)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at java.util.ArrayList.forEach(ArrayList.java:1259){noformat}
With this fix it 100% working as in
https://github.com/raju-balpande/apache_ozone/actions/runs/9427808268
> Intermittent failure in testWatchForCommitForRetryfailure
> ---------------------------------------------------------
>
> Key: HDDS-10788
> URL: https://issues.apache.org/jira/browse/HDDS-10788
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: test
> Reporter: Attila Doroszlai
> Assignee: Raju Balpande
> Priority: Major
> Attachments:
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit-output.txt,
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.txt
>
>
> {code}
> Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 134.993 s <<<
> FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure
> Time elapsed: 37.042 s <<< FAILURE!
> java.lang.AssertionError:
> Expecting:
> <"org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed
> RaftClientRequest:client-35B425B0E6BC->7ca6ef5b-8396-458a-a334-21f1f3211157@group-3A5CD4773BBD,
> cid=51, seq=null, Watch-MAJORITY_COMMITTED(95), null for 3 attempts with
> RequestTypeDependentRetryPolicy{...}">
> not to contain:
> <"Watch-MAJORITY_COMMITTED">
> at
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:295)
> {code}
> {code:title=https://github.com/apache/ozone/blob/a658802d628271efa82824dc3c316f8eebfc75d3/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestWatchForCommit.java#L292-L296}
> // client should not attempt to watch with
> // MAJORITY_COMMITTED replication level, except the grpc IO issue
> if (!logCapturer.getOutput().contains("Connection refused")) {
> assertThat(e.getMessage()).doesNotContain("Watch-MAJORITY_COMMITTED");
> }
> {code}
> {code:title=log}
> [main] WARN scm.XceiverClientRatis
> (XceiverClientRatis.java:watchForCommit(284)) - 3 way commit failed on
> pipeline ...
> java.util.concurrent.ExecutionException:
> org.apache.ratis.protocol.exceptions.NotReplicatedException: Request with
> call Id 50 and log index 95 is not yet replicated to ALL_COMMITTED
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> at
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:279)
> at
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.lambda$testWatchForCommitForRetryfailure$0(TestWatchForCommit.java:286)
> at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
> at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
> at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3115)
> at
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:285)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]