[
https://issues.apache.org/jira/browse/FLINK-29298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702031#comment-17702031
]
Lee chen commented on FLINK-29298:
----------------------------------
We alse faced to this problem, how to reproduce this problem with a stable
way, Thank you sir
> LocalBufferPool request buffer from NetworkBufferPool hanging
> -------------------------------------------------------------
>
> Key: FLINK-29298
> URL: https://issues.apache.org/jira/browse/FLINK-29298
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Affects Versions: 1.16.0
> Reporter: Weijie Guo
> Assignee: Weijie Guo
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.17.0, 1.16.1
>
> Attachments: image-2022-09-14-10-52-15-259.png,
> image-2022-09-14-10-58-45-987.png, image-2022-09-14-11-00-47-309.png
>
>
> In the scenario where the buffer contention is fierce, sometimes the task
> hang can be observed. Through the thread dump information, we can found that
> the task thread is blocked by requestMemorySegmentBlocking forever. After
> investigating the dumped heap information, I found that the NetworkBufferPool
> actually has many buffers, but the LocalBufferPool is still unavailable and
> no buffer has been obtained.
> By looking at the code, I am sure that this is a bug in thread race: when the
> task thread polled out the last buffer in LocalBufferPool and triggered the
> onGlobalPoolAvailable callback itself, it will skip this notification (as
> currently the LocalBufferPool is available), which will cause the BufferPool
> to eventually become unavailable and will never register a callback to the
> NetworkBufferPool.
> The conditions for triggering the problem are relatively strict, but I have
> found a stable way to reproduce it, I will try to fix and verify this problem.
> !image-2022-09-14-10-52-15-259.png|width=1021,height=219!
> !image-2022-09-14-10-58-45-987.png|width=997,height=315!
> !image-2022-09-14-11-00-47-309.png|width=453,height=121!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)