[
https://issues.apache.org/jira/browse/FLINK-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yingjie Cao updated FLINK-28373:
--------------------------------
Summary: Read a full buffer of data per file IO read request for
sort-shuffle (was: Read larger size of data sequentially for sort-shuffle)
> Read a full buffer of data per file IO read request for sort-shuffle
> --------------------------------------------------------------------
>
> Key: FLINK-28373
> URL: https://issues.apache.org/jira/browse/FLINK-28373
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Network
> Reporter: Yingjie Cao
> Priority: Major
> Fix For: 1.16.0
>
>
> Currently, for sort blocking shuffle, the corresponding data readers read
> shuffle data in buffer granularity. Before compression, each buffer is 32K by
> default, after compression the size will become smaller (may less than 10K).
> For file IO, this is pretty smaller. To achieve better performance and reduce
> IOPS, we can merge consecutive data requests of the same field together and
> serves them in one IO request. More specifically,
> 1) if multiple data requests are reading the same data, for example, reading
> broadcast data, the reader will read the data only once and send the same
> piece of data to multiple downstream consumers.
> 2) if multiple data requests are reading the consecutive data in one file, we
> will merge those data requests together as one large request and read a
> larger size of data sequentially which is good for file IO performance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)