Hi Kafka Devs, and Maroš,
Sorry for being late.
First, please accept my apology for checking the message too late. When
this KIP was first submitted, it didn't receive much attention, and I was
also too busy at my new position to devote time to the contribution. Since
I've only just found some time this weekend, I'm now going through a pile
of backlogged emails.
> So I’m curious — is the intent of this KIP to eventually support a
broader set of codec-specific settings, or are we intentionally scoping it
down to just block/window size for now?
To answer Maroš's question, of course, the ultimate intent of KIP-780 is to
support a wide range of codec-specific settings per codec, not only
block/window size. The reason why they were omitted is simple: block/window
size was originally a part of KIP-390, and KIP-780 was spun off from it.
So, I think taking options like gzip strategy or zstd threads should be
examined seriously.
As the original proposer of KIP-780, I have a rebooting plan (see below);
I'd like to hear your thoughts on it.
1. Update the KIP documentation
It will include the new candidate options (i.e., gzip strategy or zstd
threads) and also name Maroš as a co-proposer.
2. Open a new PR
After 1, I will open a new PR that succeeds my and Maroš's old PR. It
should be based on the latest codebase and also need an additional branch
to compare against existing segment files and recent releases (i.e., that
branch should be based on a recent release). If we can significantly reduce
the size of the segments, it could be a great option for users of remote
storage.
3. Continue discussion and cooperative work on the new PR
I will also submit a new JMX/real-world dataset benchmark, and then we
can draw conclusions on which options are reasonable to support.
I will greatly appreciate it if you would seriously consider my plan.
Regards,
Dongjin
+1. Hats-offing deserves for Mickael Maison, who finalized KIP-390 which is
the basis of KIP-780, not me.
On Sat, Jul 26, 2025 at 3:12 AM Maroš Orsák <[email protected]>
wrote:
> Hi Dongjin,
>
> Hi Kafka devs,
>
> Thanks a lot for opening this KIP — and hats off for the amount of
> benchmarking and investigation you’ve done! It’s great to see a follow-up
> to KIP-390 that digs deeper into these compression-level options with solid
> data to back it.
>
> One thing I wanted to clarify: what specific compression options are we
> targeting here? From what I saw in the related PR [1], it seems we’re
> mostly exposing block and window sizes. But many codecs expose more than
> that:
>
>
> -
>
> *GZIP* has options like strategy, window size, and buffer size
> -
>
> *LZ4* supports block size (64KB–4MB), block mode (independent vs
> linked), checksums, and dictionaries
> -
>
> *Snappy,* as far as I know, doesn’t expose much for tuning
>
> -
>
> *ZSTD* has a huge set: threading, window size, block size, dictionaries,
> long-distance matching, checksums, etc. It’s a beast in terms of
> configurability 😄
>
> So I’m curious — is the intent of this KIP to eventually support a broader
> set of codec-specific settings, or are we intentionally scoping it down to
> just block/window size for now?
>
> Also, just to check — are you still interested in implementing this KIP
> (i.e., KIP-780)? If not, would you be open to me taking it over or helping
> move it forward? Of course, only if that works for you — I’d be happy to
> coordinate if there’s still interest in pursuing this.
>
> Looking forward to your thoughts!
>
> Best,
>
> Maros Orsak
>
> [1] - https://github.com/apache/kafka/pull/11388/files
>
--
*Dongjin Lee*
*A hitchhiker in the mathematical world.*
*github: <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>keybase: https://keybase.io/dongjinleekr
<https://keybase.io/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*