divijvaidya commented on code in PR #14322:
URL: https://github.com/apache/kafka/pull/14322#discussion_r1314750361


##########
docs/design.html:
##########
@@ -136,8 +136,10 @@ <h4 class="anchor-heading"><a id="design_compression" 
class="anchor-link"></a><a
     the user can always compress its messages one at a time without any 
support needed from Kafka, but this can lead to very poor compression ratios as 
much of the redundancy is due to repetition between messages of
     the same type (e.g. field names in JSON or user agents in web logs or 
common string values). Efficient compression requires compressing multiple 
messages together rather than compressing each message individually.
     <p>
-    Kafka supports this with an efficient batching format. A batch of messages 
can be clumped together compressed and sent to the server in this form. This 
batch of messages will be written in compressed form and will
-    remain compressed in the log and will only be decompressed by the consumer.
+    Kafka supports this with an efficient batching format. A batch of messages 
can be grouped together, compressed, and sent to the server in this form. The 
broker decompresses the batch in order to validate it. For
+    example, it validates that the number of records in the batch is same as 
what batch header states. The broker may also potentially modify the batch 
(e.g., if the topic is compacted, the broker will filter out 

Review Comment:
   I just realised another thing.
   
   "if the topic is compacted, the broker will filter out records eligible for 
compaction prior to writing to disk"
   
   Are you referring to the fact that records written to a compacted topic need 
to necessarily have a non-null key else they will be rejected? If yes, then 
perhaps, we need to phrase it differently.
   
   Let me get back to you with a suggestion here in a couple of hours.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to