Yingjie Cao created FLINK-14845:
-----------------------------------

             Summary: Introduce data compression to blocking shuffle.
                 Key: FLINK-14845
                 URL: https://issues.apache.org/jira/browse/FLINK-14845
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Network
            Reporter: Yingjie Cao


Currently, blocking shuffle writer writes raw output data to disk without 
compression. For IO bounded scenario, this can be optimized by compressing the 
output data. It is better to introduce a compression mechanism and offer users 
a config option to let the user decide whether to compress the shuffle data. 
Actually, we hava implemented compression in our inner Flink version and  here 
are some key points:

1. Where to compress/decompress?

Compressing at upstream and decompressing at downstream.

2. Which thread do compress/decompress?

Task threads do compress/decompress.

3. Data compression granularity.

Per buffer.

4. How to handle that when data size become even bigger after compression?

Give up compression in this case and introduce an extra flag to identify if the 
data was compressed, that is, the output may be a mixture of compressed and 
uncompressed data.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to