Yingjie Cao created FLINK-28561:
-----------------------------------
Summary: Merge subpartition shuffle data read request for better
sequential IO
Key: FLINK-28561
URL: https://issues.apache.org/jira/browse/FLINK-28561
Project: Flink
Issue Type: Improvement
Components: Runtime / Network
Reporter: Yingjie Cao
Fix For: 1.17.0
Currently, the shuffle data of each subpartition for blocking shuffle is read
separately. To achieve better performance and reduce IOPS, we can merge
consecutive data requests of the same field together and serves them in one IO
request. More specifically,
1) if multiple data requests are reading the same data, for example, reading
broadcast data, the reader will read the data only once and send the same piece
of data to multiple downstream consumers.
2) if multiple data requests are reading the consecutive data in one file, we
will merge those data requests together as one large request and read a larger
size of data sequentially which is good for file IO performance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)