reiabreu commented on code in PR #8707:
URL: https://github.com/apache/storm/pull/8707#discussion_r3307299281


##########
docs/Serialization.md:
##########
@@ -61,6 +61,62 @@ Beware that Java serialization is extremely expensive, both 
in terms of CPU cost
 
 You can turn on/off the behavior to fall back on Java serialization by setting 
the `Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION` config to true/false. The 
default value is false for security reasons.
 
+### Tuple compression
+
+For inter-worker (remote) traffic, Storm can optionally compress serialized 
tuples with [Zstandard](https://facebook.github.io/zstd/) before they are sent 
over the network. This is intended for one specific scenario: components that 
emit **large** payloads to a remote worker, where the bytes saved on the wire 
outweigh the CPU cost of compression. A good example is a spout that emits 
entire lines of text to a downstream bolt running on a different worker.
+
+Compression is **disabled by default** and follows the serialization lifecycle 
exactly:
+
+- **Intra-worker (local) traffic** bypasses `KryoTupleSerializer` altogether, 
so it is never compressed regardless of configuration. You do not pay any CPU 
cost for tuples that stay inside a worker process.
+- **Inter-worker (remote) traffic** is compressed only when compression is 
enabled for the source component *and* the serialized tuple is larger than the 
configured threshold. Small tuples (single words, IDs, etc.) are left 
uncompressed, since the framing overhead of a compressed payload can exceed the 
original size.
+
+#### Enabling compression per component

Review Comment:
   Since you did some benchmarks what do you think about including them? With 
the obvious disclaimer that the tests were done in a limited capacity within 
the PR and should only used as a guide.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to