[
https://issues.apache.org/jira/browse/FLINK-24162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410725#comment-17410725
]
Roman Khachatryan commented on FLINK-24162:
-------------------------------------------
Thanks for looking into it [~gaoyunhaii] .
I can confirm that the task transitions to FINISHED twice: before and after a
failure:
{code:java}
23:16:57,760 INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Custom
Source -> Timestamps/Watermarks -> transform-1-forward -> Sink: Unnamed (1/4)#0
(4a3e2cd18c9e79b42dc8d6624fcbcde8) switched from RUNNING to FINISHED.
...
23:16:57,837 [ Checkpoint Timer] INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering
checkpoint 3 (type=CHECKPOI NT) @ 1630711017835 for job
3d9486075a07c60f7d6927cff31ab0db.
23:16:57,840 [jobmanager-io-thread-18] INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed
checkpoint 3 for job 3d94 86075a07c60f7d6927cff31ab0db (0 bytes,
checkpointDuration=5 ms, finalizationTime=0 ms).
23:16:57,849 [Source: Custom Source -> Timestamps/Watermarks ->
transform-1-forward -> Sink: Unnamed (3/4)#0] WARN
org.apache.flink.runtime.taskm anager.Task [] - Source:
Custom Source -> Timestamps/Watermarks -> transform-1-forward -> Sink: Unnamed
(3/4)#0 (f8498b498d21de 0ce1edd1175a20e5a6) switched from RUNNING to FAILED
with failure cause: java.lang.RuntimeException: requested to fail
at
org.apache.flink.runtime.operators.lifecycle.graph.TestEventSource.run(TestEventSource.java:82)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:116)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:73)
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:323)
...
23:16:58,357 INFO org.apache.flink.runtime.taskmanager.Task [] - Source:
Custom Source -> Timestamps/Watermarks -> transform-1-forward -> Sink: Unnamed
(1/4)#1 (4c131c07267e65d0365a4f2db71f41dc) switched from RUNNING to FINISHED.
{code}
There is a checkpoint (3) that is completed after finishing and is used for
recovery.
You're right that the whole job is restarted. However, shouldn't it be always
the case because?
TestJobBuilders#prepareEnv sets:
{code:java}
configuration.set(EXECUTION_FAILOVER_STRATEGY, "full"); {code}
> PartiallyFinishedSourcesITCase fails due to assertion error in
> DrainingValidator.validateOperatorLifecycle
> ----------------------------------------------------------------------------------------------------------
>
> Key: FLINK-24162
> URL: https://issues.apache.org/jira/browse/FLINK-24162
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream
> Affects Versions: 1.14.0, 1.15.0
> Reporter: Xintong Song
> Priority: Blocker
> Labels: test-stability
> Fix For: 1.14.0, 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23526&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=4639
> {code}
> Sep 03 23:17:11 [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0,
> Time elapsed: 19.233 s <<< FAILURE! - in
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase
> Sep 03 23:17:11 [ERROR] test[simple graph SINGLE_SUBTASK, failover: true]
> Time elapsed: 2.27 s <<< FAILURE!
> Sep 03 23:17:11 java.lang.AssertionError
> Sep 03 23:17:11 at org.junit.Assert.fail(Assert.java:87)
> Sep 03 23:17:11 at org.junit.Assert.assertTrue(Assert.java:42)
> Sep 03 23:17:11 at org.junit.Assert.assertFalse(Assert.java:65)
> Sep 03 23:17:11 at org.junit.Assert.assertFalse(Assert.java:75)
> Sep 03 23:17:11 at
> org.apache.flink.runtime.operators.lifecycle.validation.DrainingValidator.validateOperatorLifecycle(DrainingValidator.java:56)
> Sep 03 23:17:11 at
> org.apache.flink.runtime.operators.lifecycle.validation.TestOperatorLifecycleValidator.lambda$checkOperatorsLifecycle$1(TestOperatorLifecycleValidator.java:52)
> Sep 03 23:17:11 at java.util.HashMap.forEach(HashMap.java:1289)
> Sep 03 23:17:11 at
> org.apache.flink.runtime.operators.lifecycle.validation.TestOperatorLifecycleValidator.checkOperatorsLifecycle(TestOperatorLifecycleValidator.java:47)
> Sep 03 23:17:11 at
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:94)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)