[ 
https://issues.apache.org/jira/browse/FLINK-33954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-33954:
------------------------------
    Description: In some cases, the job may hang when there are not enough 
buffers in the local buffer pool. For instance, the parallelism is 4, so the 
HashBufferAccumulator is used. The size of the local buffer pool can be 5, and 
at some point, 3 of all buffers are required by 3 subpartitions and are not 
finished, so only 2 buffers are left. If a record that is larger than 2 buffers 
comes, the program would hang at requesting buffers.  (was: In some cases, the 
job may hang when there are not enough buffers in the local buffer pool. For 
instance, the parallelism is 10, so the HashBufferAccumulator is used. The size 
of local buffer pool is parallelism + 1

1. The local buffer pool size can be very small when the parallelism is small. 
So when a large record comes and it needs more buffers than the buffer pool 
has, a hang would happen.)

> Large record may cause the hybrid shuffle hang
> ----------------------------------------------
>
>                 Key: FLINK-33954
>                 URL: https://issues.apache.org/jira/browse/FLINK-33954
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>            Reporter: Jiang Xin
>            Priority: Major
>
> In some cases, the job may hang when there are not enough buffers in the 
> local buffer pool. For instance, the parallelism is 4, so the 
> HashBufferAccumulator is used. The size of the local buffer pool can be 5, 
> and at some point, 3 of all buffers are required by 3 subpartitions and are 
> not finished, so only 2 buffers are left. If a record that is larger than 2 
> buffers comes, the program would hang at requesting buffers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to