[
https://issues.apache.org/jira/browse/FLINK-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yingjie Cao updated FLINK-28373:
--------------------------------
Description: Currently, for sort blocking shuffle, the corresponding data
readers read shuffle data in buffer granularity. Before compression, each
buffer is 32K by default, after compression the size will become smaller (may
less than 10K). For file IO, this is pretty smaller. To achieve better
performance and reduce IOPS, we can read more data per IO read request and
parse buffer header and data in memory. (was: Currently, for sort blocking
shuffle, the corresponding data readers read shuffle data in buffer
granularity. Before compression, each buffer is 32K by default, after
compression the size will become smaller (may less than 10K). For file IO, this
is pretty smaller. To achieve better performance and reduce IOPS, we can merge
consecutive data requests of the same field together and serves them in one IO
request. More specifically,
1) if multiple data requests are reading the same data, for example, reading
broadcast data, the reader will read the data only once and send the same piece
of data to multiple downstream consumers.
2) if multiple data requests are reading the consecutive data in one file, we
will merge those data requests together as one large request and read a larger
size of data sequentially which is good for file IO performance.)
> Read a full buffer of data per file IO read request for sort-shuffle
> --------------------------------------------------------------------
>
> Key: FLINK-28373
> URL: https://issues.apache.org/jira/browse/FLINK-28373
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Network
> Reporter: Yingjie Cao
> Priority: Major
> Fix For: 1.16.0
>
>
> Currently, for sort blocking shuffle, the corresponding data readers read
> shuffle data in buffer granularity. Before compression, each buffer is 32K by
> default, after compression the size will become smaller (may less than 10K).
> For file IO, this is pretty smaller. To achieve better performance and reduce
> IOPS, we can read more data per IO read request and parse buffer header and
> data in memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)