[jira] [Updated] (FLINK-28561) Merge subpartition shuffle data read request for better sequential IO

Yingjie Cao (Jira) Thu, 03 Nov 2022 00:49:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-28561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yingjie Cao updated FLINK-28561:
--------------------------------
    Fix Version/s:     (was: 1.17.0)

> Merge subpartition shuffle data read request for better sequential IO
> ---------------------------------------------------------------------
>
>                 Key: FLINK-28561
>                 URL: https://issues.apache.org/jira/browse/FLINK-28561
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>            Reporter: Yingjie Cao
>            Priority: Major
>
> Currently, the shuffle data of each subpartition for blocking shuffle is read 
> separately. To achieve better performance and reduce IOPS, we can merge 
> consecutive data requests of the same field together and serves them in one 
> IO request. More specifically,
> 1) if multiple data requests are reading the same data, for example, reading 
> broadcast data, the reader will read the data only once and send the same 
> piece of data to multiple downstream consumers.
> 2) if multiple data requests are reading the consecutive data in one file, we 
> will merge those data requests together as one large request and read a 
> larger size of data sequentially which is good for file IO performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28561) Merge subpartition shuffle data read request for better sequential IO

Reply via email to