Hi Alex, thanks for the question!

In the simplest sense, the tool doesn't know anything about the messages in the 
log or any particular batch. The tool would compress the encrypted data to 
measure the resulting size, but the results would likely show no reduction in 
data size. Effectively, the tool would just spin a bunch of CPU cycles and 
produce no interesting results.

It looks like concerns around compression were raised in the KIP-317 
discussion, with the possibility of compression being disabled when encryption 
is used due to concerns about security (which I think are quite valid). My 
general take on the issue in the context of this KIP would be that this tool is 
relatively simple in nature and if needed, could be extended upon. If KIP-317 
were to change the semantics of how compression is applied to encrypted 
messages or whether compression is allowed at all, this tool can match those 
semantics, whatever they may be.

Chris

On 2020/08/24 21:49:29, Alex Wang <alew...@linkedin.com.INVALID> wrote: 
> Hi, how will this work with encrypted data in logs if/when KIP-317 gets 
> merged? Encrypted data will be hard to compress, so the analyzer tool might 
> need to acquire the decryption key somewhere measure the compression stats.
> 
> On 2020/08/17 20:23:51, "Christopher Beard (BLOOMBERG/ 919 3RD A)" 
> <cbea...@bloomberg.net> wrote: 
> > Hi everyone,
> > 
> > I would like to start a discussion on KIP-640:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-640%3A+Add+log+compression+analysis+tool
> > 
> > This KIP outlines a new CLI tool which helps compare how the various 
> > compression types supported by Kafka reduce the size of a log (and 
> > therefore more broadly, of a topic).
> > 
> > I've put together a PR that might help serve as a starting point for 
> > comments and suggestions.
> > [WIP] PR: https://github.com/apache/kafka/pull/9193
> > 
> > Thanks,
> > Chris Beard
> 

Reply via email to