[jira] [Commented] (FLINK-19938) Implement shuffle data read scheduling for sort-merge blocking shuffle

Yingjie Cao (Jira) Sun, 14 Mar 2021 05:33:11 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301139#comment-17301139
 ]


Yingjie Cao commented on FLINK-19938:
-------------------------------------

[~dahaishuantuoba] Thanks for your interest. From my test results, there is 
2-8x performance gain with the improvement (I will update the PR soon and 
welcome to test it with high parallelism batch jobs). I think there are several 
reasons for the performance gain:
 # Flink read data buffer by buffer and we have limited memory, if we do not 
care the memory consumption, maybe it is not a problem;
 # IO scheduling over the OS IO scheduler can make the IO request more 
predictable and thus can improve the cache hit ratio ( read ahead).
 # IO scheduling of OS can be influenced by several factors, for example, 
fairness, latency and performance trade off.

> Implement shuffle data read scheduling for sort-merge blocking shuffle
> ----------------------------------------------------------------------
>
>                 Key: FLINK-19938
>                 URL: https://issues.apache.org/jira/browse/FLINK-19938
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Network
>            Reporter: Yingjie Cao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.13.0
>
>
> As described in 
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-148%3A+Introduce+Sort-Merge+Based+Blocking+Shuffle+to+Flink.]
>  shuffle IO scheduling is important for performance. We'd like to Introduce 
> it to sort-merge shuffle first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19938) Implement shuffle data read scheduling for sort-merge blocking shuffle

Reply via email to