[
https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908534#comment-16908534
]
Nico Kruber edited comment on FLINK-13020 at 8/15/19 11:00 PM:
---------------------------------------------------------------
Actually, I just encountered this error in a branch of mine which is based on
[latest
master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb].
So either there has been a regression, or the fix does not work in all cases,
or it is no duplicate afterall:
{code}
17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time
elapsed: 14.113 s <<< FAILURE! - in
org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint:
1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
Time elapsed: 0.268 s <<< ERROR!
java.util.concurrent.ExecutionException:
java.util.concurrent.CompletionException:
org.apache.flink.runtime.checkpoint.CheckpointException: Task received
cancellation from one of its inputs
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.runtime.checkpoint.CheckpointException: Task received
cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task
received cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task
received cancellation from one of its inputs
{code}
https://api.travis-ci.com/v3/job/225588484/log.txt
{code}
17:30:17,408 INFO org.apache.flink.streaming.runtime.tasks.StreamTask
- Configuring application-defined state backend with job/cluster config
17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
- Source: Custom Source (2/4) (ffb5e756d6acddab9cab76e2a0a32904) switched from
DEPLOYING to RUNNING.
17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
- Map (4/4) (79fcf333d4d11eae297b65e52e397658) switched from DEPLOYING to
RUNNING.
17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
- Map (2/4) (aedaa4a61e74a3b766fafbef46e6aea6) switched from DEPLOYING to
RUNNING.
17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
- Source: Custom Source (4/4) (a1f07e2714e73b2533291a322961ea67) switched from
DEPLOYING to RUNNING.
17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
- Source: Custom Source (3/4) (6073be38d7be0ee571558f1dc865837a) switched from
DEPLOYING to RUNNING.
17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
- Map (1/4) (e4bc84d8137769b513d1a5107027500d) switched from DEPLOYING to
RUNNING.
17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
- Map (3/4) (6834950d9742da9c6a784ecc5ee892df) switched from DEPLOYING to
RUNNING.
17:30:17,409 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Checkpoint triggering task Source: Custom Source (1/4) of job
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead.
Aborting checkpoint.
17:30:17,413 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Checkpoint triggering task Source: Custom Source (1/4) of job
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead.
Aborting checkpoint.
17:30:17,414 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Checkpoint triggering task Source: Custom Source (1/4) of job
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead.
Aborting checkpoint.
17:30:17,416 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Checkpoint triggering task Source: Custom Source (1/4) of job
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead.
Aborting checkpoint.
17:30:17,417 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Checkpoint triggering task Source: Custom Source (1/4) of job
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead.
Aborting checkpoint.
17:30:17,423 INFO org.apache.flink.runtime.taskmanager.Task
- Source: Custom Source (1/4) (8b302fefb0c10b7fd0b66f4fdb253632) switched from
DEPLOYING to RUNNING.
17:30:17,423 INFO org.apache.flink.streaming.runtime.tasks.StreamTask
- Using application-defined state backend: MemoryStateBackend (data in heap
memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null',
asynchronous: UNDEFINED, maxStateSize: 5242880)
17:30:17,423 INFO org.apache.flink.streaming.runtime.tasks.StreamTask
- Configuring application-defined state backend with job/cluster config
17:30:17,424 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
- Source: Custom Source (1/4) (8b302fefb0c10b7fd0b66f4fdb253632) switched from
DEPLOYING to RUNNING.
17:30:17,425 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Triggering checkpoint 54 @ 1565890217425 for job
075cea7da1d0690f96c879ae07b058c0.
17:30:17,442 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Decline checkpoint 54 by task 6834950d9742da9c6a784ecc5ee892df of job
075cea7da1d0690f96c879ae07b058c0 at b57a17ba-32e1-42ad-991d-abf402ea07fa @
localhost (dataPort=-1).
17:30:17,442 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Discarding checkpoint 54 of job 075cea7da1d0690f96c879ae07b058c0.
org.apache.flink.runtime.checkpoint.CheckpointException: Task received
cancellation from one of its inputs
at
org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyAbortOnCancellationBarrier(BarrierBuffer.java:428)
at
org.apache.flink.streaming.runtime.io.BarrierBuffer.processCancellationBarrier(BarrierBuffer.java:327)
at
org.apache.flink.streaming.runtime.io.BarrierBuffer.pollNext(BarrierBuffer.java:208)
at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:102)
at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:47)
at
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:134)
at
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.performDefaultAction(OneInputStreamTask.java:102)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:268)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:376)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:690)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:520)
at java.lang.Thread.run(Thread.java:748)
17:30:17,444 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Received late message for now expired checkpoint attempt 54 from task
8b302fefb0c10b7fd0b66f4fdb253632 of job 075cea7da1d0690f96c879ae07b058c0 at
b57a17ba-32e1-42ad-991d-abf402ea07fa @ localhost (dataPort=-1).
17:30:17,445 ERROR
org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest -
{code}
was (Author: nicok):
Actually, I just encountered this error in a branch of mine which is based on
[latest
master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb].
So either there has been a regression, or the fix does not work in all cases,
or it is no duplicate afterall:
{code}
17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time
elapsed: 14.113 s <<< FAILURE! - in
org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint:
1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
Time elapsed: 0.268 s <<< ERROR!
java.util.concurrent.ExecutionException:
java.util.concurrent.CompletionException:
org.apache.flink.runtime.checkpoint.CheckpointException: Task received
cancellation from one of its inputs
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.runtime.checkpoint.CheckpointException: Task received
cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task
received cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task
received cancellation from one of its inputs
{code}
https://api.travis-ci.com/v3/job/225588484/log.txt
> UT Failure: ChainLengthDecreaseTest
> -----------------------------------
>
> Key: FLINK-13020
> URL: https://issues.apache.org/jira/browse/FLINK-13020
> Project: Flink
> Issue Type: Improvement
> Reporter: Bowen Li
> Priority: Major
>
> {code:java}
> 05:47:24.893 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time
> elapsed: 19.836 s <<< FAILURE! - in
> org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
> 05:47:24.895 [ERROR] testMigrationAndRestore[Migrate Savepoint:
> 1.3](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
> Time elapsed: 1.501 s <<< ERROR!
> java.util.concurrent.ExecutionException:
> java.util.concurrent.CompletionException:
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received
> cancellation from one of its inputs
> Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received
> cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task
> received cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task
> received cancellation from one of its inputs
> ...
> 05:48:27.736 [ERROR] Errors:
> 05:48:27.736 [ERROR]
> ChainLengthDecreaseTest>AbstractOperatorRestoreTestBase.testMigrationAndRestore:102->AbstractOperatorRestoreTestBase.migrateJob:138
> ยป Execution
> 05:48:27.736 [INFO]
> {code}
> https://travis-ci.org/apache/flink/jobs/551053821
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)