Hi Dongjin, I was thinking of a simple test: Snappy with 1 KB block size vs 32 KB block size. If the compression rate is similar for both, then it seems very wasteful to use 32 KB. I suspect you will see a significant difference though.
Ismael On Tue, Jun 8, 2021 at 8:27 AM Dongjin Lee <dong...@apache.org> wrote: > Hi Ismael, > > I added the linear write benchmark result to the proposal. Like the > producer benchmark, the least compression level showed the best MB/sec for > any case. I tested several configurations, but the result was almost the > same. > > If you have any proposals for the benchmark, don't hesitate to give me a > suggestion. I am a newbie to run the linear write benchmark. > > Best, > Dongjin > > On Sun, Jun 6, 2021 at 8:20 AM Dongjin Lee <dong...@apache.org> wrote: > > > Hi Ismael, > > > > Thanks for the reply. > > > > > So you're saying that reducing the buffer size didn't reduce the > > compression rate for codecs like lz4? > > > > Of course, there were some improvements in compressed size when I tried > > the 'buffer.size' option, but the gain was not significant. I tried > several > > datasets, but the result was the same. It made me so skeptical about > adding > > this option, which seemed to make the configuration option complex only. > > > > In contrast, 'compression.level' showed its effectiveness immediately. It > > is why I decided to focus on the 'compression.level' in this rework. > > > > As you can see in the update KIP with the benchmark, IMHO, the true value > > of supporting the compression option may not be the compressed size or > > rate, but speed. By tweaking the compression level slightly, it showed > > great produce performance gain. > > > > Thanks, > > Dongjin > > > > > > On Sun, Jun 6, 2021 at 6:48 AM Ismael Juma <ism...@juma.me.uk> wrote: > > > >> Thanks Dongjin. So you're saying that reducing the buffer size didn't > >> reduce the compression rate for codecs like lz4? If so, that would > suggest > >> reducing the default value, but that seems odd. > >> > >> Ismael > >> > >> On Sat, Jun 5, 2021, 9:25 AM Dongjin Lee <dong...@apache.org> wrote: > >> > >> > Hello Kafka dev, > >> > > >> > I hope to reboot the discussion of KIP-390: Support Compression Level > >> > < > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level > >> > >. > >> > It proposes to add a new option, 'compression.level', that controls > the > >> > compression level. > >> > > >> > This KIP has been submitted more than one year ago, but had been > >> neglected > >> > for a long time. Recently I reworked it from scratch with the > following > >> > differences: > >> > > >> > 1. Tested how it works with a real-world dataset. As you can see in > the > >> > updated KIP, *this feature can improve the producer's message/second > >> rate > >> > by more than 50%*, such a significant enhancement. > >> > 2. Dropped 'compression.buffer.size' option that was in the initial > >> work. > >> > With the repeated benchmarks, I could not find any evidence this > option > >> > results in meaningful differences. So I removed it. > >> > > >> > All feedback will be highly appreciated. > >> > > >> > Best, > >> > Dongjin > >> > > >> > > >> > -- > >> > *Dongjin Lee* > >> > > >> > *A hitchhiker in the mathematical world.* > >> > > >> > > >> > > >> > *github: <http://goog_969573159/>github.com/dongjinleekr > >> > <https://github.com/dongjinleekr>keybase: > >> https://keybase.io/dongjinleekr > >> > <https://keybase.io/dongjinleekr>linkedin: > >> kr.linkedin.com/in/dongjinleekr > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: > >> > speakerdeck.com/dongjin > >> > <https://speakerdeck.com/dongjin>* > >> > > >> > > > > > > -- > > *Dongjin Lee* > > > > *A hitchhiker in the mathematical world.* > > > > > > > > *github: <http://goog_969573159/>github.com/dongjinleekr > > <https://github.com/dongjinleekr>keybase: > https://keybase.io/dongjinleekr > > <https://keybase.io/dongjinleekr>linkedin: > kr.linkedin.com/in/dongjinleekr > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: > speakerdeck.com/dongjin > > <https://speakerdeck.com/dongjin>* > > > > > -- > *Dongjin Lee* > > *A hitchhiker in the mathematical world.* > > > > *github: <http://goog_969573159/>github.com/dongjinleekr > <https://github.com/dongjinleekr>keybase: https://keybase.io/dongjinleekr > <https://keybase.io/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: > speakerdeck.com/dongjin > <https://speakerdeck.com/dongjin>* >