[ 
https://issues.apache.org/jira/browse/FLINK-29923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630244#comment-17630244
 ] 

Weijie Guo commented on FLINK-29923:
------------------------------------

Through offline discussion with [~AlexXXX] , it is true that the task are stuck 
forever. Further, the cause of the problem should be the same as FLINK-29298 
previously reported. It is a bug in the `LocalBufferPool`, and hybrid shuffle 
does increase the competition of network buffers, which makes it difficult to 
reproduce this bug under blocking shuffle, but it almost repeats under the 
specific query of hybrid shuffle, so I think it should be considered as a very 
serious bug.

> Hybrid Shuffle may face deadlock when running a task need to execute big size 
> data
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-29923
>                 URL: https://issues.apache.org/jira/browse/FLINK-29923
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.16.0
>            Reporter: AlexHu
>            Priority: Major
>         Attachments: 性能差距.png, 死锁2-select.png, 死锁检测.png
>
>
> The flink 1.16 offers hybrid shuffle to combine the superiority of blocking 
> shuffle and pipeline shuffle. But when I want to test this new feature I face 
> a problem that it may cause deadlock when it running. 
> Actually, it will run well at beginning. However, when it runs to a certain 
> number it may failure for the buffer size and if I set a bigger size it may 
> running without data execution like the picture. So I want to ask the cause 
> of this problem and a solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to