[GitHub] [flink-web] infoverload commented on a change in pull request #476: Add blog post "Sort-Based Blocking Shuffle Implementation in Flink"

GitBox Mon, 25 Oct 2021 08:14:42 -0700


infoverload commented on a change in pull request #476:
URL: https://github.com/apache/flink-web/pull/476#discussion_r735703573




##########
File path: _posts/2021-10-15-sort-shuffle-part2.md
##########
@@ -0,0 +1,154 @@
+---
+layout: post
+title: "Sort-Based Blocking Shuffle Implementation in Flink - Part Two"
+date: 2021-10-15 00:00:00
+authors:
+- Yingjie Cao:
+  name: "Yingjie Cao (Kevin)"
+- Daisy Tsang:
+  name: "Daisy Tsang"
+excerpt: Flink has implemented the sort-based blocking shuffle (FLIP-148) for 
batch data processing. In this blog post, we will take a close look at the 
design & implementation details and see what we can gain from it.
+---
+
+The part two of this blog post will describe the [design 
considerations](#design-considerations) & 
[implementations](#implementation-details) in detail which can provide more 
insights and list several [potential improvements](#future-improvements) that 
can be implemented in the future.
+
+{% toc %}
+
+# Abstract
+
+Like sort-merge shuffle implemented by other distributed data processing 
frameworks, the whole sort-based shuffle process in Flink consists of several 
important stages, including collecting data in memory, sorting the collected 
data in memory, spilling the sorted data to files, and reading the shuffle data 
from these spilled files. However, Flink’s implementation has some core 
differences, including the multiple data region file structure, the removal of 
file merge, and IO scheduling. The following sections describe some core design 
considerations and implementations of the sort-based blocking shuffle in Flink.
+
+# Design Considerations

Review comment:
       ```suggestion
   # Design considerations
   ```

##########
File path: _posts/2021-10-15-sort-shuffle-part2.md
##########
@@ -0,0 +1,154 @@
+---
+layout: post
+title: "Sort-Based Blocking Shuffle Implementation in Flink - Part Two"
+date: 2021-10-15 00:00:00
+authors:
+- Yingjie Cao:
+  name: "Yingjie Cao (Kevin)"
+- Daisy Tsang:
+  name: "Daisy Tsang"
+excerpt: Flink has implemented the sort-based blocking shuffle (FLIP-148) for 
batch data processing. In this blog post, we will take a close look at the 
design & implementation details and see what we can gain from it.
+---
+
+The part two of this blog post will describe the [design 
considerations](#design-considerations) & 
[implementations](#implementation-details) in detail which can provide more 
insights and list several [potential improvements](#future-improvements) that 
can be implemented in the future.
+
+{% toc %}
+
+# Abstract
+
+Like sort-merge shuffle implemented by other distributed data processing 
frameworks, the whole sort-based shuffle process in Flink consists of several 
important stages, including collecting data in memory, sorting the collected 
data in memory, spilling the sorted data to files, and reading the shuffle data 
from these spilled files. However, Flink’s implementation has some core 
differences, including the multiple data region file structure, the removal of 
file merge, and IO scheduling. The following sections describe some core design 
considerations and implementations of the sort-based blocking shuffle in Flink.
+
+# Design Considerations
+
+There are several core objectives we want to achieve for the new sort-based 
blocking shuffle to be implemented Flink:
+
+## Produce Fewer (Small) Files

Review comment:
       ```suggestion
   ## Produce fewer (small) files
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] infoverload commented on a change in pull request #476: Add blog post "Sort-Based Blocking Shuffle Implementation in Flink"

Reply via email to