[ 
https://issues.apache.org/jira/browse/FLINK-33879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-33879:
------------------------------
    Description: 
Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
together, however, in the following scenario the result partition would stop 
working.

Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
accroding to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
the memory tier uses up all 8 buffers and the input channel doesn't consume 
them because of some problem, the disk tier can still work with 1 reserved 
buffer. However, if a redistribution happens now and the pool size is decreased 
to less than 8, then the BufferAccumulator can not request buffers anymore, and 
thus the result partition stops working as well.

The purpose is to make the result partition can still work with disk tier and 
write the shuffle data to disk, so that once the input channel is restored, the 
data on the disk can be consumed immediately

  was:
Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
together, however, in the following scenario the result partition would stop 
working.

Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
accroding to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
the memory tier uses up all 8 buffers and the input channel doesn't consume 
them because of some problem, the disk tier can still work with 1 reserved 
buffer. However, if a redistribution happens now and the pool size is decreased 
to less than 8, then the BufferAccumulator can not request buffers anymore, and 
thus the result partition stops working as well.


> Hybrid Shuffle may hang during redistribution
> ---------------------------------------------
>
>                 Key: FLINK-33879
>                 URL: https://issues.apache.org/jira/browse/FLINK-33879
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>            Reporter: Jiang Xin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.19.0
>
>
> Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
> together, however, in the following scenario the result partition would stop 
> working.
> Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
> 15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
> accroding to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
> the memory tier uses up all 8 buffers and the input channel doesn't consume 
> them because of some problem, the disk tier can still work with 1 reserved 
> buffer. However, if a redistribution happens now and the pool size is 
> decreased to less than 8, then the BufferAccumulator can not request buffers 
> anymore, and thus the result partition stops working as well.
> The purpose is to make the result partition can still work with disk tier and 
> write the shuffle data to disk, so that once the input channel is restored, 
> the data on the disk can be consumed immediately



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to