[
https://issues.apache.org/jira/browse/FLINK-21992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320653#comment-17320653
]
Guowei Ma commented on FLINK-21992:
-----------------------------------
We also see the same stack in our internal tests without unaligned checkpoint
enabled.
{code:java}
[stat_date, cate_id, user_id, biz, MIN(visit_time_int) FILTER $f5 AS min$0,
MIN(visit_time_int) FILTER $f6 AS min$1, MIN(visit_time_int) FILTER $f7 AS
min$2, MIN(visit_time_int) FILTER $f8 AS min$3, MIN(visit_time_int) FILTER $f9
AS min$4]))) (68/256)#0" Id=101 WAITING on
java.util.concurrent.CompletableFuture$Signaller@2e0b208c
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.CompletableFuture$Signaller@2e0b208c
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
at
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegmentBlocking(LocalBufferPool.java:319)
at
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:291)
...
{code}
> Fix availability notification in UnionInputGate
> -----------------------------------------------
>
> Key: FLINK-21992
> URL: https://issues.apache.org/jira/browse/FLINK-21992
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.12.2, 1.13.0
> Reporter: Arvid Heise
> Assignee: Arvid Heise
> Priority: Blocker
> Labels: pull-request-available
>
> A user on mailing list reported that his job gets stuck with unaligned
> checkpoint enabled.
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Source-Operators-Stuck-in-the-requestBufferBuilderBlocking-tt42530.html
> We received two similar reports in the past, but the users didn't follow up,
> so it was not as easy to diagnose as this time where the initial report
> already contains many relevant data points.
> Beside a buffer leak, there could also be an issue with priority notification.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)