[
https://issues.apache.org/jira/browse/FLINK-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301139#comment-17301139
]
Yingjie Cao commented on FLINK-19938:
-------------------------------------
[~dahaishuantuoba] Thanks for your interest. From my test results, there is
2-8x performance gain with the improvement (I will update the PR soon and
welcome to test it with high parallelism batch jobs). I think there are several
reasons for the performance gain:
# Flink read data buffer by buffer and we have limited memory, if we do not
care the memory consumption, maybe it is not a problem;
# IO scheduling over the OS IO scheduler can make the IO request more
predictable and thus can improve the cache hit ratio ( read ahead).
# IO scheduling of OS can be influenced by several factors, for example,
fairness, latency and performance trade off.
> Implement shuffle data read scheduling for sort-merge blocking shuffle
> ----------------------------------------------------------------------
>
> Key: FLINK-19938
> URL: https://issues.apache.org/jira/browse/FLINK-19938
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Network
> Reporter: Yingjie Cao
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.13.0
>
>
> As described in
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-148%3A+Introduce+Sort-Merge+Based+Blocking+Shuffle+to+Flink.]
> shuffle IO scheduling is important for performance. We'd like to Introduce
> it to sort-merge shuffle first.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)