[ https://issues.apache.org/jira/browse/FLINK-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977693#comment-16977693 ]
Stephan Ewen commented on FLINK-14845: -------------------------------------- [~lzljs3620320] Quick question for clarification: In the example you described, the large table could (should) be connected to the join task by a pipelined channel that does not spill (in memory, connecting source and join who run co-located in the same slot). I guess that should also be possible for the Blink query engine in 1.10 I guess, with FLIP-53 ? > Introduce data compression to blocking shuffle. > ----------------------------------------------- > > Key: FLINK-14845 > URL: https://issues.apache.org/jira/browse/FLINK-14845 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network > Reporter: Yingjie Cao > Priority: Major > > Currently, blocking shuffle writer writes raw output data to disk without > compression. For IO bounded scenario, this can be optimized by compressing > the output data. It is better to introduce a compression mechanism and offer > users a config option to let the user decide whether to compress the shuffle > data. Actually, we hava implemented compression in our inner Flink version > and here are some key points: > 1. Where to compress/decompress? > Compressing at upstream and decompressing at downstream. > 2. Which thread do compress/decompress? > Task threads do compress/decompress. > 3. Data compression granularity. > Per buffer. > 4. How to handle that when data size become even bigger after compression? > Give up compression in this case and introduce an extra flag to identify if > the data was compressed, that is, the output may be a mixture of compressed > and uncompressed data. > > We'd like to introduce blocking shuffle data compression to Flink if there > are interests. > -- This message was sent by Atlassian Jira (v8.3.4#803005)