[
https://issues.apache.org/jira/browse/FLINK-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231425#comment-17231425
]
Robert Metzger commented on FLINK-20124:
----------------------------------------
Ran a complicated, misbehaving job, and found this problem with unaligned
checkpoints: https://issues.apache.org/jira/browse/FLINK-20145
Otherwise, the job works fine so far, after a few hours.
During cancellation, I stumbled across this log message, but I guess we can not
do anything about it, because we are just printing the current stack, and this
just happened to be in Flink code twice:
{code}
2020-11-13 13:29:57,000 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor [] -
Un-registering task and sending final execution state CANCELED to JobManager
for task Flat Map (3/4)#0 079036e2a439aac3bed0063f3f8f6a2c.
2020-11-13 13:30:26,602 WARN org.apache.flink.runtime.taskmanager.Task
[] - Task 'Source: control events generator (3/4)#0' did not react
to cancelling signal for 30 seconds, but is stuck in method:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take(TaskMailboxImpl.java:146)
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:299)
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:184)
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:577)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:541)
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:547)
java.lang.Thread.run(Thread.java:748)
2020-11-13 13:30:56,608 WARN org.apache.flink.runtime.taskmanager.Task
[] - Task 'Source: control events generator (3/4)#0' did not react
to cancelling signal for 30 seconds, but is stuck in method:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
org.apache.flink.streaming.runtime.tasks.StreamTask.cleanUpInvoke(StreamTask.java:620)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:554)
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:547)
java.lang.Thread.run(Thread.java:748)
{code}
> Test pipelined region scheduler
> -------------------------------
>
> Key: FLINK-20124
> URL: https://issues.apache.org/jira/browse/FLINK-20124
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.12.0
> Reporter: Robert Metzger
> Assignee: Robert Metzger
> Priority: Critical
> Fix For: 1.12.0
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)