[ 
https://issues.apache.org/jira/browse/FLINK-28889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-28889:
-------------------------------
    Description: Hybrid shuffle does not support multiple consumer for single 
subpartition data. This will bring some defects, such as the inability to 
support partition reuse, speculative execution. In particular, it cannot 
support broadcast optimization, that is, hybrid shuffle writes multiple copies 
of broadcast data, This will cause a waste of memory and disk space and affect 
the performance of shuffle write phase. Ideally, for the full spilling 
strategy, any broadcast data (record or event) should only write one piece of 
data in the memory, and the same is true for the disk.  (was: Hybrid shuffle 
writes multiple copies of broadcast data, This will cause a waste of memory and 
disk space and affect the performance of shuffle write phase. Ideally, for the 
full spilling strategy, any broadcast data (record or event) should only write 
one piece of data in the memory, and the same is true for the disk. For 
selective spilling strategy, if the broadcast edge is encountered, we should 
consider directly turning it into the edge of HYBRID_FULL, or introducing 
configuration option to decide whether to do this switch. )

> Hybrid shuffle writes multiple copies of broadcast data
> -------------------------------------------------------
>
>                 Key: FLINK-28889
>                 URL: https://issues.apache.org/jira/browse/FLINK-28889
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>    Affects Versions: 1.16.0
>            Reporter: Weijie Guo
>            Assignee: Weijie Guo
>            Priority: Critical
>             Fix For: 1.17.0
>
>
> Hybrid shuffle does not support multiple consumer for single subpartition 
> data. This will bring some defects, such as the inability to support 
> partition reuse, speculative execution. In particular, it cannot support 
> broadcast optimization, that is, hybrid shuffle writes multiple copies of 
> broadcast data, This will cause a waste of memory and disk space and affect 
> the performance of shuffle write phase. Ideally, for the full spilling 
> strategy, any broadcast data (record or event) should only write one piece of 
> data in the memory, and the same is true for the disk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to