[ 
https://issues.apache.org/jira/browse/FLINK-33278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776185#comment-17776185
 ] 

Matthias Pohl edited comment on FLINK-33278 at 10/17/23 12:24 PM:
------------------------------------------------------------------

That's a suspicious one. I'm not able to reproduce it locally with over 25000 
repetitions. It's hard to tell what's the cause of this issue due to the lack 
of debug logs. I'm wondering whether we should enable debug logging for the 
pekko module using {{LogLevelExtension}} to get a bit more out of the test run 
in the future. Enabling it for all modules (even when written to its own log 
files like we do it for the ZooKeeper dependency) might be a bit of an overkill 
considering how many RPC calls are sent.

I would assume that it's not only a 1.19.0 issue but also affects 1.18.0 since 
we migrated to Pekko.

[~chesnay] do you have any other suggestions on that one? I couldn't find any 
error like that in Jira (considering Akka instead of Pekko as a substring).


was (Author: mapohl):
That's a suspicious one. I'm not able to reproduce it locally with over 20000 
repetitions. It's hard to tell what's the cause of this issue due to the lack 
of debug logs. I'm wondering whether we should enable debug logging for the 
pekko module using {{LogLevelExtension}} to get a bit more out of the test run 
in the future. Enabling it for all modules (even when written to its own log 
files like we do it for the ZooKeeper dependency) might be a bit of an overkill 
considering how many RPC calls are sent.

I would assume that it's not only a 1.19.0 issue but also affects 1.18.0 since 
we migrated to Pekko.

[~chesnay] do you have any other suggestions on that one? I couldn't find any 
error like that in Jira (considering Akka instead of Pekko as a substring).

> RemotePekkoRpcActorTest.failsRpcResultImmediatelyIfRemoteRpcServiceIsNotAvailable
>  fails on AZP
> ----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-33278
>                 URL: https://issues.apache.org/jira/browse/FLINK-33278
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / RPC
>    Affects Versions: 1.19.0
>            Reporter: Sergey Nuyanzin
>            Priority: Critical
>              Labels: test-stability
>
> This build 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53740&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=6563]
> fails as
> {noformat}
> Oct 15 01:02:20 Multiple Failures (1 failure)
> Oct 15 01:02:20 -- failure 1 --
> Oct 15 01:02:20 [Any cause is instance of class 'class 
> org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException'] 
> Oct 15 01:02:20 Expecting any element of:
> Oct 15 01:02:20   [java.util.concurrent.CompletionException: 
> java.util.concurrent.TimeoutException: Invocation of 
> [RemoteRpcInvocation(SerializedValueRespondingGateway.getSerializedValue())] 
> at recipient 
> [pekko.tcp://flink@localhost:38231/user/rpc/8c211f34-41e5-4efe-93bd-8eca6c590a7f]
>  timed out. This is usually caused by: 1) Pekko failed sending the message 
> silently, due to problems like oversized payload or serialization failures. 
> In that case, you should find detailed error information in the logs. 2) The 
> recipient needs more time for responding, due to problems like slow machines 
> or network jitters. In that case, you can try to increase pekko.ask.timeout.
> Oct 15 01:02:20       at 
> java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
> Oct 15 01:02:20       at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> Oct 15 01:02:20       at 
> org.apache.flink.runtime.rpc.pekko.RemotePekkoRpcActorTest.lambda$failsRpcResultImmediatelyIfRemoteRpcServiceIsNotAvailable$1(RemotePekkoRpcActorTest.java:168)
> Oct 15 01:02:20       ...(63 remaining lines not displayed - this can be 
> changed with Assertions.setMaxStackTraceElementsDisplayed),
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to