Hi Apurva,

Yes, it is true that the request size might be much smaller if the batching
is based on uncompressed size. I will let the users know about this. That
said, in practice, this is probably fine. For example, at LinkedIn, our max
message size is 1 MB, typically the compressed size would be 100 KB or
larger, given that in most cases, there are many partitions, the request
size would not be too small (typically around a few MB).

At LinkedIn we do have some topics has various compression ratio. Those are
usually topics shared by different services so the data may differ a lot
although they are in the same topic and similar fields.

Thanks,

Jiangjie (Becket) Qin


On Tue, Feb 21, 2017 at 6:17 PM, Apurva Mehta <apu...@confluent.io> wrote:

> Hi Becket, Thanks for the kip.
>
> I think one of the risks here is that when compression estimation is
> disabled, you could have much smaller batches than expected, and throughput
> could be hurt. It would be worth adding this to the documentation of this
> setting.
>
> Also, one of the rejected alternatives states that per topic estimations
> would not work when the compression of individual messages is variable.
> This is true in theory, but in practice one would expect Kafka topics to
> have fairly homogenous data, and hence should compress evenly. I was
> curious if you have data which shows otherwise.
>
> Thanks,
> Apurva
>
> On Tue, Feb 21, 2017 at 12:30 PM, Becket Qin <becket....@gmail.com> wrote:
>
> > Hi folks,
> >
> > I would like to start the discussion thread on KIP-126. The KIP propose
> > adding a new configuration to KafkaProducer to allow batching based on
> > uncompressed message size.
> >
> > Comments are welcome.
> >
> > The KIP wiki is following:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 126+-+Allow+KafkaProducer+to+batch+based+on+uncompressed+size
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
>

Reply via email to