[ https://issues.apache.org/jira/browse/FLINK-31910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhipeng Zhang updated FLINK-31910: ---------------------------------- Description: Using BroadcastUtils#withBroadcast in iteration perround mode got stuck. From the thread dump, it seems that the head operator and criteria node are stuck and waiting for a mail. {code:java} "output-head-Parallel Collection Source -> Sink: Unnamed (4/4)#0" #228 prio=5 os_prio=31 tid=0x00007f9e1d083800 nid=0x19b07 waiting on condition [0x0000700013db6000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000747a83270> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) at org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take(TaskMailboxImpl.java:149) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) at org.apache.flink.runtime.taskmanager.Task$$Lambda$1383/280145505.run(Unknown Source) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.lang.Thread.run(Thread.java:748) {code} The demo for this bug could be found here: https://github.com/zhipeng93/flink-ml/tree/FLINK-31910-demo-case was: Using BroadcastUtils#withBroadcast in iteration perround mode got stuck. From the thread dump, it seems that the head operator and criteria node are stuck and waiting for a mail. {code:java} "output-head-Parallel Collection Source -> Sink: Unnamed (4/4)#0" #228 prio=5 os_prio=31 tid=0x00007f9e1d083800 nid=0x19b07 waiting on condition [0x0000700013db6000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000747a83270> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) at org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take(TaskMailboxImpl.java:149) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) at org.apache.flink.runtime.taskmanager.Task$$Lambda$1383/280145505.run(Unknown Source) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.lang.Thread.run(Thread.java:748) {code} The demo for this bug could be found here: > Using BroadcastUtils#withBroadcast in iteration perround mode got stuck > ----------------------------------------------------------------------- > > Key: FLINK-31910 > URL: https://issues.apache.org/jira/browse/FLINK-31910 > Project: Flink > Issue Type: Bug > Components: Library / Machine Learning > Affects Versions: ml-2.3.0 > Reporter: Zhipeng Zhang > Priority: Major > > Using BroadcastUtils#withBroadcast in iteration perround mode got stuck. > >From the thread dump, it seems that the head operator and criteria node are > stuck and waiting for a mail. > > {code:java} > "output-head-Parallel Collection Source -> Sink: Unnamed (4/4)#0" #228 prio=5 > os_prio=31 tid=0x00007f9e1d083800 nid=0x19b07 waiting on condition > [0x0000700013db6000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000747a83270> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) > at > org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take(TaskMailboxImpl.java:149) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) > at > org.apache.flink.runtime.taskmanager.Task$$Lambda$1383/280145505.run(Unknown > Source) > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) > at java.lang.Thread.run(Thread.java:748) {code} > > The demo for this bug could be found here: > https://github.com/zhipeng93/flink-ml/tree/FLINK-31910-demo-case -- This message was sent by Atlassian Jira (v8.20.10#820010)