[jira] [Updated] (FLINK-28373) Read a full buffer of data per file IO read request for sort-shuffle

Yingjie Cao (Jira) Thu, 14 Jul 2022 23:03:04 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yingjie Cao updated FLINK-28373:
--------------------------------
    Description: Currently, for sort blocking shuffle, the corresponding data 
readers read shuffle data in buffer granularity. Before compression, each 
buffer is 32K by default, after compression the size will become smaller (may 
less than 10K). For file IO, this is pretty smaller. To achieve better 
performance and reduce IOPS, we can read more data per IO read request and 
parse buffer header and data in memory.  (was: Currently, for sort blocking 
shuffle, the corresponding data readers read shuffle data in buffer 
granularity. Before compression, each buffer is 32K by default, after 
compression the size will become smaller (may less than 10K). For file IO, this 
is pretty smaller. To achieve better performance and reduce IOPS, we can merge 
consecutive data requests of the same field together and serves them in one IO 
request. More specifically,

1) if multiple data requests are reading the same data, for example, reading 
broadcast data, the reader will read the data only once and send the same piece 
of data to multiple downstream consumers.

2) if multiple data requests are reading the consecutive data in one file, we 
will merge those data requests together as one large request and read a larger 
size of data sequentially which is good for file IO performance.)

> Read a full buffer of data per file IO read request for sort-shuffle
> --------------------------------------------------------------------
>
>                 Key: FLINK-28373
>                 URL: https://issues.apache.org/jira/browse/FLINK-28373
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Network
>            Reporter: Yingjie Cao
>            Priority: Major
>             Fix For: 1.16.0
>
>
> Currently, for sort blocking shuffle, the corresponding data readers read 
> shuffle data in buffer granularity. Before compression, each buffer is 32K by 
> default, after compression the size will become smaller (may less than 10K). 
> For file IO, this is pretty smaller. To achieve better performance and reduce 
> IOPS, we can read more data per IO read request and parse buffer header and 
> data in memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28373) Read a full buffer of data per file IO read request for sort-shuffle

Reply via email to