pnowojski commented on a change in pull request #10375: [FLINK-14845][runtime]
Introduce data compression to reduce disk and network IO of shuffle.
URL: https://github.com/apache/flink/pull/10375#discussion_r354770372
##########
File path:
flink-core/src/main/java/org/apache/flink/configuration/NettyShuffleEnvironmentOptions.java
##########
@@ -54,6 +54,28 @@
.withDescription("Enable SSL support for the
taskmanager data transport. This is applicable only when the" +
" global flag for internal SSL (" +
SecurityOptions.SSL_INTERNAL_ENABLED.key() + ") is set to true");
+ /**
+ * Boolean flag indicating whether the shuffle data will be compressed
or not.
+ *
+ * <p>Note: Data is compressed per buffer (may be sliced buffer in
pipeline mode) and compression can incur extra
+ * CPU overhead so it is more effective for IO bounded scenario when
data compression ratio is high.
+ */
+ public static final ConfigOption<Boolean> DATA_COMPRESSION_ENABLED =
+ key("taskmanager.data.compression.enabled")
Review comment:
If that happens we could always map `LZ4` to some new value.
I talked with @NicoK and we both slightly would prefer to have fewer
configuration parameters (`taskmanager.network.blocking-shuffle.compression:
NONE/LZ4` and `taskmanager.network.pipelined-shuffle.compression: NONE/LZ4`).
However it's not a blocker for us if you have a strong opposite feeling.
If we moved to `taskmanager.network.blocking-shuffle.compression: NONE/LZ4`
and `taskmanager.network.pipelined-shuffle.compression: NONE/LZ4`, the problem
might be to support a use case when user configures different compression
algorithms for `pipelined` and `blocking`? If that can not be easily supported,
let's just drop the discussion and leave it as it is now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services