[
https://issues.apache.org/jira/browse/FLINK-26616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506176#comment-17506176
]
Chesnay Schepler commented on FLINK-26616:
------------------------------------------
I couldn't reproduce it so far. The logs show that the tasks are deployed just
fine (with a number of failed checkpoint triggers because not everything was
running it).
But then strangely once the first checkpoint is triggered:
{code}
15:34:07,406 [ Checkpoint Timer] INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 1 (type=CheckpointType{name='Checkpoint',
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1647012847406 for job
339650c79b74dd74a1f27117a41a6dbf.
{code}
nothing happens for an entire minute, and then the test fails with the timeout.
It is a bit too consistent for it to be one of those "VM just stopped doing
stuff for a bit" issues, as it fails exactly when the timeout triggers.
> AdaptiveSchedulerITCase.testExceptionHistoryIsRetrievableFromTheRestAPI
> failed with a timeout
> ---------------------------------------------------------------------------------------------
>
> Key: FLINK-26616
> URL: https://issues.apache.org/jira/browse/FLINK-26616
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.15.0
> Reporter: Matthias Pohl
> Assignee: Chesnay Schepler
> Priority: Critical
> Fix For: 1.15.0
>
>
> {{AdaptiveSchedulerITCase.}} failed in [this
> build|https://dev.azure.com/mapohl/flink/_build/results?buildId=855&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=9a028d19-6c4b-5a4e-d378-03fca149d0b1&l=5778]
> due to a timeout.
> {code}
> Mar 11 14:41:36 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0,
> Time elapsed: 76.177 s <<< FAILURE! - in
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> Mar 11 14:41:36 [ERROR]
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase.testExceptionHistoryIsRetrievableFromTheRestAPI
> Time elapsed: 60.146 s <<< ERROR!
> Mar 11 14:41:36 java.util.concurrent.TimeoutException: Condition was not met
> in given timeout.
> Mar 11 14:41:36 at
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:167)
> Mar 11 14:41:36 at
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> Mar 11 14:41:36 at
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:137)
> Mar 11 14:41:36 at
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase.testExceptionHistoryIsRetrievableFromTheRestAPI(AdaptiveSchedulerITCase.java:268)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)