*My concern is that this minor improvement (based on the benchmark) over
LZ4 does not warrant the work of adding support for a new compression codec
to the broker, official clients and horde of 3rd party clients, including
upgrade paths, transformations, tests, additional dependencies, etc.*


As someone who primarily works with non-Java clients, I couldn't +1 this
enough.

If there's truly a benefit (and probably there is...the Facebook post
announcing ZStandard had some fairly compelling benchmarks) then yes, let's
do this. But otherwise, please no.

On Tue, Jan 31, 2017 at 1:47 AM, Magnus Edenhill <mag...@edenhill.se> wrote:

> Hi Dongjin and good work on the KIP,
>
> I understand that ZStandard is generally considered an improvement over
> LZ4, but the
> benchmark you provided on the KIP-110 wiki doesn't really reflect that, and
> even
> makes a note that they are comparable:
> *> As you can see above, ZStandard shows outstanding performance in both of
> compression rate and speed, especially working with the speed-first setting
> (level 1). To the extent that only LZ4 can be compared to ZStandard.*
>
> My concern is that this minor improvement (based on the benchmark) over LZ4
> does not warrant the work
> of adding support for a new compression codec to the broker, official
> clients and horde of 3rd party clients, including
> upgrade paths, transformations, tests, additional dependencies, etc.
>
> Is it possible to produce more convincing comparisons?
>
> Thanks,
> Magnus
>
>
>
>
>
> 2017-01-31 10:28 GMT+01:00 Dongjin Lee <dong...@apache.org>:
>
> > Ismael & All,
> >
> > After Inspecting the related code & commits, I concluded following:
> >
> > 1. We have to update the masking value which is used to retrieve the used
> > codec id from the messages, to enable the retrieval of the 3rd bit of
> > compression type field of the message.
> > 2. The above task is already done; so, we need nothing.
> >
> > Here is why.
> >
> > Let's start with the first one, with the scenario Juma proposed. In the
> > case of receiving the message of unsupported compression type, the
> receiver
> > (= broker or consumer) raises IllegalArgumentException[^1][^2]. The key
> > element in this operation is Record#COMPRESSION_CODEC_MASK, which is used
> > to extract the codec id. We have to update this value from 2-bits
> extractor
> > (=0x03) to 3-bits extractor (=0x07).
> >
> > But in fact, this task is already done, so its current value is 0x07. We
> > don't have to update it.
> >
> > The reason why this task is already done has some story; From the first
> > time Record.java file was added to the project[^3], the
> > COMPRESSION_CODEC_MASK was already 2-bits extractor, that is, 0x03. At
> that
> > time, Kafka supported only two compression types - GZipCompression and
> > SnappyCompression.[^4] After that, KAFKA-1456 introduced two additional
> > codecs of LZ4 and LZ4C[^5]. This update modified COMPRESSION_CODEC_MASK
> > into 3 bits extractor, 0x07, in the aim of supporting four compression
> > codecs.
> >
> > Although its following work, KAFKA-1493, removed the support of LZ4C
> > codec[^6], this mask was not reverted into 2-bits extractor - by this
> > reason, we don't need to care about the message format.
> >
> > Attached screenshot is my test on Juma's scenario. I created topic & sent
> > some messages using the snapshot version with ZStandard compression and
> > received the message with the latest version. As you can see, it works
> > perfectly as expected.
> >
> > If you have more opinion to this issue, don't hesitate to send me as a
> > message.
> >
> > Best,
> > Dongjin
> >
> > [^1]: see: Record#compressionType.
> > [^2]: There is similar routine in Message.scala. But after KAFKA-4390,
> > that routine is not being used anymore - more precisely, Message class is
> > now used in ConsoleConsumer only. I think this class should be replaced
> but
> > since it is a separated topic, I will send another message for this
> issue.
> > [^3]: commit 642da2f (2011.8.2).
> > [^4]: commit c51b940.
> > [^5]: commit 547cced.
> > [^6]: commit 37356bf.
> >
> > On Thu, Jan 26, 2017 at 12:35 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> >> So far the discussion was around the performance characteristics of the
> >> new
> >> compression algorithm. Another area that is important and is not covered
> >> in
> >> the KIP is the compatibility implications. For example, what happens if
> a
> >> consumer that doesn't support zstd tries to consume a topic compressed
> >> with
> >> it? Or if a broker that doesn't support receives data compressed with
> it?
> >> If we go through that exercise, then more changes may be required (like
> >> bumping the version of produce/fetch protocols).
> >>
> >> Ismael
> >>
> >> On Wed, Jan 25, 2017 at 3:22 PM, Ben Stopford <b...@confluent.io> wrote:
> >>
> >> > Is there more discussion to be had on this KIP, or should it be taken
> >> to a
> >> > vote?
> >> >
> >> > On Mon, Jan 16, 2017 at 6:37 AM Dongjin Lee <dong...@apache.org>
> wrote:
> >> >
> >> > > I updated KIP-110 with JMH-measured benchmark results. Please have a
> >> > review
> >> > > when you are free. (The overall result is not different yet.)
> >> > >
> >> > > Regards,
> >> > > Dongjin
> >> > >
> >> > > +1. Could anyone assign KAFKA-4514 to me?
> >> > >
> >> > > On Thu, Jan 12, 2017 at 11:39 AM, Dongjin Lee <dong...@apache.org>
> >> > wrote:
> >> > >
> >> > > > Okay, I will have a try.
> >> > > > Thanks Ewen for the guidance!!
> >> > > >
> >> > > > Best,
> >> > > > Dongjin
> >> > > >
> >> > > > On Thu, Jan 12, 2017 at 6:44 AM, Ismael Juma <ism...@juma.me.uk>
> >> > wrote:
> >> > > >
> >> > > >> That's a good point Ewen. Dongjin, you could use the branch that
> >> Ewen
> >> > > >> linked for the performance testing. It would also help validate
> the
> >> > PR.
> >> > > >>
> >> > > >> Ismael
> >> > > >>
> >> > > >> On Wed, Jan 11, 2017 at 9:38 PM, Ewen Cheslack-Postava <
> >> > > e...@confluent.io
> >> > > >> >
> >> > > >> wrote:
> >> > > >>
> >> > > >> > FYI, there's an outstanding patch for getting some JMH
> >> benchmarking
> >> > > >> setup:
> >> > > >> > https://github.com/apache/kafka/pull/1712 I haven't found time
> >> to
> >> > > >> review
> >> > > >> > it
> >> > > >> > (and don't really know JMH well anyway) but it might be worth
> >> > getting
> >> > > >> that
> >> > > >> > landed so we can use it for this as well.
> >> > > >> >
> >> > > >> > -Ewen
> >> > > >> >
> >> > > >> > On Wed, Jan 11, 2017 at 6:35 AM, Dongjin Lee <
> dong...@apache.org
> >> >
> >> > > >> wrote:
> >> > > >> >
> >> > > >> > > Hi Ismael,
> >> > > >> > >
> >> > > >> > > 1. In the case of compression output, yes, lz4 is producing
> the
> >> > > >> smaller
> >> > > >> > > output than gzip. In fact, my benchmark was inspired
> >> > > >> > > by MessageCompressionTest#testCompressSize unit test and the
> >> > result
> >> > > >> is
> >> > > >> > > same - 396 bytes for gzip and 387 bytes for lz4.
> >> > > >> > > 2. I agree that my (former) approach can result in unreliable
> >> > > output.
> >> > > >> > > However, I am experiencing difficulties on how to acquire the
> >> > > >> benchmark
> >> > > >> > > metrics from Kafka. For you recommended JMH, I just started
> to
> >> > > google
> >> > > >> for
> >> > > >> > > it. If possible, could you give any example on how to use JMH
> >> > > against
> >> > > >> > > Kafka? If it is the case, it will be a great help.
> >> > > >> > > Regards,Dongjin
> >> > > >> > >
> >> > > >> > >                 _____________________________
> >> > > >> > > From: Ismael Juma <ism...@juma.me.uk>
> >> > > >> > > Sent: Wednesday, January 11, 2017 7:33 PM
> >> > > >> > > Subject: Re: [DISCUSS] KIP-110: Add Codec for ZStandard
> >> > Compression
> >> > > >> > > To:  <dev@kafka.apache.org>
> >> > > >> > >
> >> > > >> > >
> >> > > >> > > Thanks Dongjin. I highly recommend using JMH for the
> benchmark,
> >> > the
> >> > > >> > > existing one has a few problems that could result in
> unreliable
> >> > > >> results.
> >> > > >> > > Also, it's a bit surprising that LZ4 is producing smaller
> >> output
> >> > > than
> >> > > >> > gzip.
> >> > > >> > > Is that right?
> >> > > >> > >
> >> > > >> > > Ismael
> >> > > >> > >
> >> > > >> > > On Wed, Jan 11, 2017 at 10:20 AM, Dongjin Lee <
> >> dong...@apache.org
> >> > >
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > > Ismael,
> >> > > >> > > >
> >> > > >> > > > I pushed the benchmark code I used, with some updates
> >> > (iteration:
> >> > > >> 20 ->
> >> > > >> > > > 1000). I also updated the KIP page with the updated
> benchmark
> >> > > >> results.
> >> > > >> > > > Please take a review when you are free. The attached
> >> screenshot
> >> > > >> shows
> >> > > >> > how
> >> > > >> > > > to run the benchmarker.
> >> > > >> > > >
> >> > > >> > > > Thanks,
> >> > > >> > > > Dongjin
> >> > > >> > > >
> >> > > >> > > > On Tue, Jan 10, 2017 at 8:03 PM, Dongjin Lee <
> >> > dong...@apache.org>
> >> > > >> > wrote:
> >> > > >> > > >
> >> > > >> > > >> Ismael,
> >> > > >> > > >>
> >> > > >> > > >> I see. Then, I will share the benchmark code I used by
> >> > tomorrow.
> >> > > >> > Thanks
> >> > > >> > > >> for your guidance.
> >> > > >> > > >>
> >> > > >> > > >> Best,
> >> > > >> > > >> Dongjin
> >> > > >> > > >>
> >> > > >> > > >> -----
> >> > > >> > > >>
> >> > > >> > > >> Dongjin Lee
> >> > > >> > > >>
> >> > > >> > > >> Software developer in Line+.
> >> > > >> > > >> So interested in massive-scale machine learning.
> >> > > >> > > >>
> >> > > >> > > >> facebook: www.facebook.com/dongjin.lee.kr
> >> > > >> > > >> linkedin: kr.linkedin.com/in/dongjinleekr
> >> > > >> > > >> github: github.com/dongjinleekr
> >> > > >> > > >> twitter: www.twitter.com/dongjinleekr
> >> > > >> > > >>
> >> > > >> > > >>
> >> > > >> > > >>
> >> > > >> > > >>
> >> > > >> > > >> On Tue, Jan 10, 2017 at 7:24 PM +0900, "Ismael Juma" <
> >> > > >> > ism...@juma.me.uk
> >> > > >> > > >
> >> > > >> > > >> wrote:
> >> > > >> > > >>
> >> > > >> > > >> Dongjin,
> >> > > >> > > >>>
> >> > > >> > > >>> The KIP states:
> >> > > >> > > >>>
> >> > > >> > > >>> "I compared the compressed size and compression time of 3
> >> > > >> 1kb-sized
> >> > > >> > > >>> messages (3102 bytes in total), with the
> >> Draft-implementation
> >> > of
> >> > > >> > > ZStandard
> >> > > >> > > >>> Compression Codec and all currently available
> >> > CompressionCodecs.
> >> > > >> All
> >> > > >> > > >>> elapsed times are the average of 20 trials."
> >> > > >> > > >>>
> >> > > >> > > >>> But doesn't give any details of how this was implemented.
> >> Is
> >> > the
> >> > > >> > source
> >> > > >> > > >>> code available somewhere? Micro-benchmarking in the JVM
> is
> >> > > pretty
> >> > > >> > > tricky so
> >> > > >> > > >>> it needs verification before numbers can be trusted. A
> >> > > performance
> >> > > >> > test
> >> > > >> > > >>> with kafka-producer-perf-test.sh would be nice to have as
> >> > well,
> >> > > if
> >> > > >> > > possible.
> >> > > >> > > >>>
> >> > > >> > > >>> Thanks,
> >> > > >> > > >>> Ismael
> >> > > >> > > >>>
> >> > > >> > > >>> On Tue, Jan 10, 2017 at 7:44 AM, Dongjin Lee  wrote:
> >> > > >> > > >>>
> >> > > >> > > >>> > Ismael,
> >> > > >> > > >>> >
> >> > > >> > > >>> > 1. Is the benchmark in the KIP page not enough? You
> mean
> >> we
> >> > > >> need a
> >> > > >> > > whole
> >> > > >> > > >>> > performance test using kafka-producer-perf-test.sh?
> >> > > >> > > >>> >
> >> > > >> > > >>> > 2. It seems like no major project is relying on it
> >> > currently.
> >> > > >> > > However,
> >> > > >> > > >>> > after reviewing the code, I concluded that at least
> this
> >> > > project
> >> > > >> > has
> >> > > >> > > a good
> >> > > >> > > >>> > test coverage. And for the problem of upstream
> tracking -
> >> > > >> although
> >> > > >> > > there is
> >> > > >> > > >>> > no significant update on ZStandard to judge this
> >> problem, it
> >> > > >> seems
> >> > > >> > > not bad.
> >> > > >> > > >>> > If required, I can take responsibility of the tracking
> >> for
> >> > > this
> >> > > >> > > library.
> >> > > >> > > >>> >
> >> > > >> > > >>> > Thanks,
> >> > > >> > > >>> > Dongjin
> >> > > >> > > >>> >
> >> > > >> > > >>> > On Tue, Jan 10, 2017 at 7:09 AM, Ismael Juma  wrote:
> >> > > >> > > >>> >
> >> > > >> > > >>> > > Thanks for posting the KIP, ZStandard looks like a
> nice
> >> > > >> > > improvement over
> >> > > >> > > >>> > > the existing compression algorithms. A couple of
> >> > questions:
> >> > > >> > > >>> > >
> >> > > >> > > >>> > > 1. Can you please elaborate on the details of the
> >> > benchmark?
> >> > > >> > > >>> > > 2. About https://github.com/luben/zstd-jni, can we
> >> rely
> >> > on
> >> > > >> it? A
> >> > > >> > > few
> >> > > >> > > >>> > > things
> >> > > >> > > >>> > > to consider: are there other projects using it, does
> it
> >> > have
> >> > > >> good
> >> > > >> > > test
> >> > > >> > > >>> > > coverage, are there performance tests, does it track
> >> > > upstream
> >> > > >> > > closely?
> >> > > >> > > >>> > >
> >> > > >> > > >>> > > Thanks,
> >> > > >> > > >>> > > Ismael
> >> > > >> > > >>> > >
> >> > > >> > > >>> > > On Fri, Jan 6, 2017 at 2:40 AM, Dongjin Lee  wrote:
> >> > > >> > > >>> > >
> >> > > >> > > >>> > > > Hi all,
> >> > > >> > > >>> > > >
> >> > > >> > > >>> > > > I've just posted a new KIP "KIP-110: Add Codec for
> >> > > ZStandard
> >> > > >> > > >>> > Compression"
> >> > > >> > > >>> > > > for
> >> > > >> > > >>> > > > discussion:
> >> > > >> > > >>> > > >
> >> > > >> > > >>> > > > https://cwiki.apache.org/confl
> >> uence/display/KAFKA/KIP-
> >> > > >> > > >>> > > > 110%3A+Add+Codec+for+ZStandard+Compression
> >> > > >> > > >>> > > >
> >> > > >> > > >>> > > > Please have a look when you are free.
> >> > > >> > > >>> > > >
> >> > > >> > > >>> > > > Best,
> >> > > >> > > >>> > > > Dongjin
> >> > > >> > > >>> > > >
> >> > > >> > > >>> > > > --
> >> > > >> > > >>> > > > *Dongjin Lee*
> >> > > >> > > >>> > > >
> >> > > >> > > >>> > > >
> >> > > >> > > >>> > > > *Software developer in Line+.So interested in
> >> > > massive-scale
> >> > > >> > > machine
> >> > > >> > > >>> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> >> > > >> > > >>> > > > linkedin:
> >> > > >> > > >>> > > > kr.linkedin.com/in/dongjinleekr
> >> > > >> > > >>> > > > github:
> >> > > >> > > >>> > > > github.com/dongjinleekr
> >> > > >> > > >>> > > > twitter: www.twitter.com/dongjinleekr
> >> > > >> > > >>> > > > *
> >> > > >> > > >>> > > >
> >> > > >> > > >>> > >
> >> > > >> > > >>> >
> >> > > >> > > >>> >
> >> > > >> > > >>> >
> >> > > >> > > >>> > --
> >> > > >> > > >>> > *Dongjin Lee*
> >> > > >> > > >>> >
> >> > > >> > > >>> >
> >> > > >> > > >>> > *Software developer in Line+.So interested in
> >> massive-scale
> >> > > >> machine
> >> > > >> > > >>> > learning.facebook: www.facebook.com/dongjin.lee.kr
> >> > > >> > > >>> > linkedin:
> >> > > >> > > >>> > kr.linkedin.com/in/dongjinleekr
> >> > > >> > > >>> > github:
> >> > > >> > > >>> > github.com/dongjinleekr
> >> > > >> > > >>> > twitter: www.twitter.com/dongjinleekr
> >> > > >> > > >>> > *
> >> > > >> > > >>> >
> >> > > >> > > >>>
> >> > > >> > > >>>
> >> > > >> > > >
> >> > > >> > > >
> >> > > >> > > > --
> >> > > >> > > > *Dongjin Lee*
> >> > > >> > > >
> >> > > >> > > >
> >> > > >> > > > *Software developer in Line+.So interested in massive-scale
> >> > > machine
> >> > > >> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> >> > > >> > > > <http://www.facebook.com/dongjin.lee.kr>linkedin:
> >> > > >> kr.linkedin.com/in/
> >> > > >> > > dongjinleekr
> >> > > >> > > > <http://kr.linkedin.com/in/dongjinleekr>github:
> >> > > >> > > > <http://goog_969573159/>github.com/dongjinleekr
> >> > > >> > > > <http://github.com/dongjinleekr>twitter:
> >> > > >> www.twitter.com/dongjinleekr
> >> > > >> > > > <http://www.twitter.com/dongjinleekr>*
> >> > > >> > > >
> >> > > >> > >
> >> > > >> > >
> >> > > >> > >
> >> > > >> > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > *Dongjin Lee*
> >> > > >
> >> > > >
> >> > > > *Software developer in Line+.So interested in massive-scale
> machine
> >> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> >> > > > <http://www.facebook.com/dongjin.lee.kr>linkedin:
> >> > > kr.linkedin.com/in/dongjinleekr
> >> > > > <http://kr.linkedin.com/in/dongjinleekr>github:
> >> > > > <http://goog_969573159/>github.com/dongjinleekr
> >> > > > <http://github.com/dongjinleekr>twitter:
> >> www.twitter.com/dongjinleekr
> >> > > > <http://www.twitter.com/dongjinleekr>*
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > *Dongjin Lee*
> >> > >
> >> > >
> >> > > *Software developer in Line+.So interested in massive-scale machine
> >> > > learning.facebook: www.facebook.com/dongjin.lee.kr
> >> > > <http://www.facebook.com/dongjin.lee.kr>linkedin:
> >> > > kr.linkedin.com/in/dongjinleekr
> >> > > <http://kr.linkedin.com/in/dongjinleekr>github:
> >> > > <http://goog_969573159/>github.com/dongjinleekr
> >> > > <http://github.com/dongjinleekr>twitter:
> www.twitter.com/dongjinleekr
> >> > > <http://www.twitter.com/dongjinleekr>*
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > *Dongjin Lee*
> >
> >
> > *Software developer in Line+.So interested in massive-scale machine
> > learning.facebook: www.facebook.com/dongjin.lee.kr
> > <http://www.facebook.com/dongjin.lee.kr>linkedin: kr.linkedin.com/in/
> dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>github:
> > <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
> > <http://www.twitter.com/dongjinleekr>*
> >
>

Reply via email to