[jira] [Updated] (FLINK-28373) Read a full buffer of data per file IO read request for sort-shuffle

Yingjie Cao (Jira) Thu, 14 Jul 2022 23:02:04 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yingjie Cao updated FLINK-28373:
--------------------------------
    Summary: Read a full buffer of data per file IO read request for 
sort-shuffle  (was: Read larger size of data sequentially for sort-shuffle)

> Read a full buffer of data per file IO read request for sort-shuffle
> --------------------------------------------------------------------
>
>                 Key: FLINK-28373
>                 URL: https://issues.apache.org/jira/browse/FLINK-28373
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Network
>            Reporter: Yingjie Cao
>            Priority: Major
>             Fix For: 1.16.0
>
>
> Currently, for sort blocking shuffle, the corresponding data readers read 
> shuffle data in buffer granularity. Before compression, each buffer is 32K by 
> default, after compression the size will become smaller (may less than 10K). 
> For file IO, this is pretty smaller. To achieve better performance and reduce 
> IOPS, we can merge consecutive data requests of the same field together and 
> serves them in one IO request. More specifically,
> 1) if multiple data requests are reading the same data, for example, reading 
> broadcast data, the reader will read the data only once and send the same 
> piece of data to multiple downstream consumers.
> 2) if multiple data requests are reading the consecutive data in one file, we 
> will merge those data requests together as one large request and read a 
> larger size of data sequentially which is good for file IO performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28373) Read a full buffer of data per file IO read request for sort-shuffle

Reply via email to