[
https://issues.apache.org/jira/browse/FLINK-22003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313069#comment-17313069
]
Yun Gao commented on FLINK-22003:
---------------------------------
I checked the case of tasks are all running when computes tasks to trigger, but
tasks are failover during actually trigger, the current implementation has
considered the case, but there might be some corner case that the checkpoint
could still get actually triggered:
[FLINK-22088|https://issues.apache.org/jira/browse/FLINK-22088].
However, even if we fixed the above two issues, there might still be cases that
if failover happens during checkpoint triggering, the trigger still get success
but the execution get gone before trigger success. For example, the failover
might happen right before the checkpoint coordinator going to trigger the
source tasks, and when the trigger happen the tasks get gone. In this case, the
checkpoint would still have to wait for timeout, which might cause the test
fail. Therefore, I think we may still need to set timeout explicitly for this
test.
> UnalignedCheckpointITCase fail
> ------------------------------
>
> Key: FLINK-22003
> URL: https://issues.apache.org/jira/browse/FLINK-22003
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.13.0
> Reporter: Guowei Ma
> Priority: Major
> Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15601&view=logs&j=119bbba7-f5e3-5e08-e72d-09f1529665de&t=7dc1f5a9-54e1-502e-8b02-c7df69073cfc&l=4142
> {code:java}
> [ERROR] execute[parallel pipeline with remote channels, p =
> 5](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase) Time
> elapsed: 60.018 s <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 60000
> milliseconds
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> at
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> at
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1859)
> at
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:69)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1839)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1822)
> at
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.execute(UnalignedCheckpointTestBase.java:138)
> at
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.execute(UnalignedCheckpointITCase.java:184)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)