reiabreu commented on code in PR #8707: URL: https://github.com/apache/storm/pull/8707#discussion_r3307299281
########## docs/Serialization.md: ########## @@ -61,6 +61,62 @@ Beware that Java serialization is extremely expensive, both in terms of CPU cost You can turn on/off the behavior to fall back on Java serialization by setting the `Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION` config to true/false. The default value is false for security reasons. +### Tuple compression + +For inter-worker (remote) traffic, Storm can optionally compress serialized tuples with [Zstandard](https://facebook.github.io/zstd/) before they are sent over the network. This is intended for one specific scenario: components that emit **large** payloads to a remote worker, where the bytes saved on the wire outweigh the CPU cost of compression. A good example is a spout that emits entire lines of text to a downstream bolt running on a different worker. + +Compression is **disabled by default** and follows the serialization lifecycle exactly: + +- **Intra-worker (local) traffic** bypasses `KryoTupleSerializer` altogether, so it is never compressed regardless of configuration. You do not pay any CPU cost for tuples that stay inside a worker process. +- **Inter-worker (remote) traffic** is compressed only when compression is enabled for the source component *and* the serialized tuple is larger than the configured threshold. Small tuples (single words, IDs, etc.) are left uncompressed, since the framing overhead of a compressed payload can exceed the original size. + +#### Enabling compression per component Review Comment: Since you did some benchmarks what do you think about including them? With the obvious disclaimer that the tests were done in a limited capacity within the PR and should only used as a guide. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
