[jira] [Commented] (KAFKA-4169) Calculation of message size is too conservative for compressed messages

Pere Urbon-Bayes (Jira) Tue, 22 Aug 2023 09:08:45 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757549#comment-17757549
 ]


Pere Urbon-Bayes commented on KAFKA-4169:
-----------------------------------------

Hi, 

   I hope to provide some help around this issue, as I'm still finding it in my 
day2day doing Kafka with different organizations, so willing to land a hand and 
see if we can finally close it somehow.

 

[~jeqo] (nice to see your name sir!)

> I'm wondering if this is still a valid issue.

 

I think this is still a valid issue, not for everyone certainly, not a critical 
issue, but from time to time it pops up on the radar giving people a 
harder-than-necessary user experience, I think. I have seen cases where this is 
happening, especially when message sizes/load might include some pattern 
differences, however, that is rare. generally speaking, it hits mostly (at 
least in my bubble) at the beginning of the people's journey (which might give 
some unnecessary initial headache I think).

 

> Validation based on CompressionRationEstimator value:

 

I kinda like this approach, but I would rather not use an exception, I would 
probably provide some level of Logging warnings, as it could be wrong as 
estimation needs to warm up. But using CompressionRatioEstimator might be the 
best shoot to give users a better report feedback I think.

 

[~jjkoshy] 

>  The current config doc dates back to the initial implementation of the 
>producer: {{The maximum size of a request in bytes. This is also effectively a 
>cap on the maximum record size...}}

 

That certainly helps, but in my experience, people usually don't read much or 
very good. So I think providing some more feedback loop on the application side 
might be a good idea, based on what Jorge proposed, I think I can provide 
something up and running that could help.

 

>  It really should be checked in the sender but then we may also want to 
>divide up partitions into smaller requests (if there are multiple partitions 
>in the request).

 

I have been taking a look at that, and I tend to agree, doing some more checks 
on the sender might provide a more clear view, however, with the current 
implementation it will be complex I think. Basically, I think we still will 
fall into estimators, or see if it can be leveraged the current structures 
within the CompressType class.  I will take a deeper look before proposing 
something here.

 

BTW, not sure I understand your second point about diving up partitions into 
smaller requests? as far as I understand that code path is already in when a 
broker returns an error, right? 

 

Looking forward to help out,

 

-- Pere

> Calculation of message size is too conservative for compressed messages
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-4169
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4169
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 0.10.0.1
>            Reporter: Dustin Cote
>            Assignee: Pere Urbon-Bayes
>            Priority: Major
>
> Currently the producer uses the uncompressed message size to check against 
> {{max.request.size}} even if a {{compression.type}} is defined.  This can be 
> reproduced as follows:
> {code}
> # dd if=/dev/zero of=/tmp/outsmaller.dat bs=1024 count=1000
> # cat /tmp/out.dat | bin/kafka-console-producer --broker-list localhost:9092 
> --topic tester --producer-property compression.type=gzip
> {code}
> The above code creates a file that is the same size as the default for 
> {{max.request.size}} and the added overhead of the message pushes the 
> uncompressed size over the limit.  Compressing the message ahead of time 
> allows the message to go through.  When the message is blocked, the following 
> exception is produced:
> {code}
> [2016-09-14 08:56:19,558] ERROR Error when sending message to topic tester 
> with key: null, value: 1048576 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.RecordTooLargeException: The message is 
> 1048610 bytes when serialized which is larger than the maximum request size 
> you have configured with the max.request.size configuration.
> {code}
> For completeness, I have confirmed that the console producer is setting 
> {{compression.type}} properly by enabling DEBUG so this appears to be a 
> problem in the size estimate of the message itself.  I would suggest we 
> compress before we serialize instead of the other way around to avoid this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-4169) Calculation of message size is too conservative for compressed messages

Reply via email to