[ 
https://issues.apache.org/jira/browse/FLINK-26616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506176#comment-17506176
 ] 

Chesnay Schepler commented on FLINK-26616:
------------------------------------------

I couldn't reproduce it so far. The logs show that the tasks are deployed just 
fine (with a number of failed checkpoint triggers because not everything was 
running it).

But then strangely once the first checkpoint is triggered:
{code}
15:34:07,406 [    Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
checkpoint 1 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1647012847406 for job 
339650c79b74dd74a1f27117a41a6dbf.
{code}
nothing happens for an entire minute, and then the test fails with the timeout.

It is a bit too consistent for it to be one of those "VM just stopped doing 
stuff for a bit" issues, as it fails exactly when the timeout triggers.


> AdaptiveSchedulerITCase.testExceptionHistoryIsRetrievableFromTheRestAPI 
> failed with a timeout
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-26616
>                 URL: https://issues.apache.org/jira/browse/FLINK-26616
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Assignee: Chesnay Schepler
>            Priority: Critical
>             Fix For: 1.15.0
>
>
> {{AdaptiveSchedulerITCase.}} failed in [this 
> build|https://dev.azure.com/mapohl/flink/_build/results?buildId=855&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=9a028d19-6c4b-5a4e-d378-03fca149d0b1&l=5778]
>  due to a timeout.
> {code}
> Mar 11 14:41:36 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 76.177 s <<< FAILURE! - in 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> Mar 11 14:41:36 [ERROR] 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase.testExceptionHistoryIsRetrievableFromTheRestAPI
>   Time elapsed: 60.146 s  <<< ERROR!
> Mar 11 14:41:36 java.util.concurrent.TimeoutException: Condition was not met 
> in given timeout.
> Mar 11 14:41:36       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:167)
> Mar 11 14:41:36       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> Mar 11 14:41:36       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:137)
> Mar 11 14:41:36       at 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase.testExceptionHistoryIsRetrievableFromTheRestAPI(AdaptiveSchedulerITCase.java:268)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to