[
https://issues.apache.org/jira/browse/FLINK-33954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jiang Xin updated FLINK-33954:
------------------------------
Description:
In some cases, the job may hang when there are not enough buffers in the local
buffer pool. For instance, the parallelism is 10, so the HashBufferAccumulator
is used. The size of local buffer pool is parallelism + 1
1. The local buffer pool size can be very small when the parallelism is small.
So when a large record comes and it needs more buffers than the buffer pool
has, a hang would happen.
was:The local buffer pool size can be very small when the parallelism is
small. So when a large record comes and it needs more buffers than the buffer
pool has, a hang would happen.
> Large record may cause the hybrid shuffle hang
> ----------------------------------------------
>
> Key: FLINK-33954
> URL: https://issues.apache.org/jira/browse/FLINK-33954
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Reporter: Jiang Xin
> Priority: Major
>
> In some cases, the job may hang when there are not enough buffers in the
> local buffer pool. For instance, the parallelism is 10, so the
> HashBufferAccumulator is used. The size of local buffer pool is parallelism +
> 1
> 1. The local buffer pool size can be very small when the parallelism is
> small. So when a large record comes and it needs more buffers than the buffer
> pool has, a hang would happen.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)