[ 
https://issues.apache.org/jira/browse/FLINK-22003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313069#comment-17313069
 ] 

Yun Gao commented on FLINK-22003:
---------------------------------

I checked the case of tasks are all running when computes tasks to trigger, but 
tasks are failover during actually trigger, the current implementation has 
considered the case, but there might be some corner case that the checkpoint 
could still get actually triggered: 
[FLINK-22088|https://issues.apache.org/jira/browse/FLINK-22088].

However, even if we fixed the above two issues, there might still be cases that 
if failover happens during checkpoint triggering, the trigger still get success 
but the execution get gone before trigger success. For example, the failover 
might happen right before the checkpoint coordinator going to trigger the 
source tasks, and when the trigger happen the tasks get gone. In this case, the 
checkpoint would still have to wait for timeout, which might cause the test 
fail.  Therefore, I think we may still need to set timeout explicitly for this 
test. 

> UnalignedCheckpointITCase fail
> ------------------------------
>
>                 Key: FLINK-22003
>                 URL: https://issues.apache.org/jira/browse/FLINK-22003
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.13.0
>            Reporter: Guowei Ma
>            Priority: Major
>              Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15601&view=logs&j=119bbba7-f5e3-5e08-e72d-09f1529665de&t=7dc1f5a9-54e1-502e-8b02-c7df69073cfc&l=4142
> {code:java}
> [ERROR] execute[parallel pipeline with remote channels, p = 
> 5](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 60.018 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 60000 
> milliseconds
>       at sun.misc.Unsafe.park(Native Method)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>       at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>       at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>       at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>       at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1859)
>       at 
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:69)
>       at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1839)
>       at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1822)
>       at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.execute(UnalignedCheckpointTestBase.java:138)
>       at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.execute(UnalignedCheckpointITCase.java:184)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to