[ 
https://issues.apache.org/jira/browse/FLINK-33278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778708#comment-17778708
 ] 

Matthias Pohl commented on FLINK-33278:
---------------------------------------

thanks for looking into this, [~jiabao.sun]. I am not able to follow what 
you're doing. Stopping the code execution at the lines that you suggest in your 
screenshots doesn't make the test fail for me. Generally speaking, if you stop 
the execution at the "right" place in the code it becomes quite likely that you 
generate a timeout. 

That's also what we most likely have observed in the logs where there the 
machine didn't continue processing for some time (based on the logged 
timestamps).

> RemotePekkoRpcActorTest.failsRpcResultImmediatelyIfRemoteRpcServiceIsNotAvailable
>  fails on AZP
> ----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-33278
>                 URL: https://issues.apache.org/jira/browse/FLINK-33278
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / RPC
>    Affects Versions: 1.19.0
>            Reporter: Sergey Nuyanzin
>            Priority: Critical
>              Labels: test-stability
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> This build 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53740&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=6563]
> fails as
> {noformat}
> Oct 15 01:02:20 Multiple Failures (1 failure)
> Oct 15 01:02:20 -- failure 1 --
> Oct 15 01:02:20 [Any cause is instance of class 'class 
> org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException'] 
> Oct 15 01:02:20 Expecting any element of:
> Oct 15 01:02:20   [java.util.concurrent.CompletionException: 
> java.util.concurrent.TimeoutException: Invocation of 
> [RemoteRpcInvocation(SerializedValueRespondingGateway.getSerializedValue())] 
> at recipient 
> [pekko.tcp://flink@localhost:38231/user/rpc/8c211f34-41e5-4efe-93bd-8eca6c590a7f]
>  timed out. This is usually caused by: 1) Pekko failed sending the message 
> silently, due to problems like oversized payload or serialization failures. 
> In that case, you should find detailed error information in the logs. 2) The 
> recipient needs more time for responding, due to problems like slow machines 
> or network jitters. In that case, you can try to increase pekko.ask.timeout.
> Oct 15 01:02:20       at 
> java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
> Oct 15 01:02:20       at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> Oct 15 01:02:20       at 
> org.apache.flink.runtime.rpc.pekko.RemotePekkoRpcActorTest.lambda$failsRpcResultImmediatelyIfRemoteRpcServiceIsNotAvailable$1(RemotePekkoRpcActorTest.java:168)
> Oct 15 01:02:20       ...(63 remaining lines not displayed - this can be 
> changed with Assertions.setMaxStackTraceElementsDisplayed),
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to