Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

Dongjin Lee Sun, 15 Jan 2017 22:38:06 -0800

I updated KIP-110 with JMH-measured benchmark results. Please have a review
when you are free. (The overall result is not different yet.)


Regards,
Dongjin

+1. Could anyone assign KAFKA-4514 to me?

On Thu, Jan 12, 2017 at 11:39 AM, Dongjin Lee <dong...@apache.org> wrote:

> Okay, I will have a try.
> Thanks Ewen for the guidance!!
>
> Best,
> Dongjin
>
> On Thu, Jan 12, 2017 at 6:44 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
>> That's a good point Ewen. Dongjin, you could use the branch that Ewen
>> linked for the performance testing. It would also help validate the PR.
>>
>> Ismael
>>
>> On Wed, Jan 11, 2017 at 9:38 PM, Ewen Cheslack-Postava <e...@confluent.io
>> >
>> wrote:
>>
>> > FYI, there's an outstanding patch for getting some JMH benchmarking
>> setup:
>> > https://github.com/apache/kafka/pull/1712 I haven't found time to
>> review
>> > it
>> > (and don't really know JMH well anyway) but it might be worth getting
>> that
>> > landed so we can use it for this as well.
>> >
>> > -Ewen
>> >
>> > On Wed, Jan 11, 2017 at 6:35 AM, Dongjin Lee <dong...@apache.org>
>> wrote:
>> >
>> > > Hi Ismael,
>> > >
>> > > 1. In the case of compression output, yes, lz4 is producing the
>> smaller
>> > > output than gzip. In fact, my benchmark was inspired
>> > > by MessageCompressionTest#testCompressSize unit test and the result
>> is
>> > > same - 396 bytes for gzip and 387 bytes for lz4.
>> > > 2. I agree that my (former) approach can result in unreliable output.
>> > > However, I am experiencing difficulties on how to acquire the
>> benchmark
>> > > metrics from Kafka. For you recommended JMH, I just started to google
>> for
>> > > it. If possible, could you give any example on how to use JMH against
>> > > Kafka? If it is the case, it will be a great help.
>> > > Regards,Dongjin
>> > >
>> > >                 _____________________________
>> > > From: Ismael Juma <ism...@juma.me.uk>
>> > > Sent: Wednesday, January 11, 2017 7:33 PM
>> > > Subject: Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression
>> > > To:  <dev@kafka.apache.org>
>> > >
>> > >
>> > > Thanks Dongjin. I highly recommend using JMH for the benchmark, the
>> > > existing one has a few problems that could result in unreliable
>> results.
>> > > Also, it's a bit surprising that LZ4 is producing smaller output than
>> > gzip.
>> > > Is that right?
>> > >
>> > > Ismael
>> > >
>> > > On Wed, Jan 11, 2017 at 10:20 AM, Dongjin Lee <dong...@apache.org>
>> > wrote:
>> > >
>> > > > Ismael,
>> > > >
>> > > > I pushed the benchmark code I used, with some updates (iteration:
>> 20 ->
>> > > > 1000). I also updated the KIP page with the updated benchmark
>> results.
>> > > > Please take a review when you are free. The attached screenshot
>> shows
>> > how
>> > > > to run the benchmarker.
>> > > >
>> > > > Thanks,
>> > > > Dongjin
>> > > >
>> > > > On Tue, Jan 10, 2017 at 8:03 PM, Dongjin Lee <dong...@apache.org>
>> > wrote:
>> > > >
>> > > >> Ismael,
>> > > >>
>> > > >> I see. Then, I will share the benchmark code I used by tomorrow.
>> > Thanks
>> > > >> for your guidance.
>> > > >>
>> > > >> Best,
>> > > >> Dongjin
>> > > >>
>> > > >> -----
>> > > >>
>> > > >> Dongjin Lee
>> > > >>
>> > > >> Software developer in Line+.
>> > > >> So interested in massive-scale machine learning.
>> > > >>
>> > > >> facebook: www.facebook.com/dongjin.lee.kr
>> > > >> linkedin: kr.linkedin.com/in/dongjinleekr
>> > > >> github: github.com/dongjinleekr
>> > > >> twitter: www.twitter.com/dongjinleekr
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Tue, Jan 10, 2017 at 7:24 PM +0900, "Ismael Juma" <
>> > ism...@juma.me.uk
>> > > >
>> > > >> wrote:
>> > > >>
>> > > >> Dongjin,
>> > > >>>
>> > > >>> The KIP states:
>> > > >>>
>> > > >>> "I compared the compressed size and compression time of 3
>> 1kb-sized
>> > > >>> messages (3102 bytes in total), with the Draft-implementation of
>> > > ZStandard
>> > > >>> Compression Codec and all currently available CompressionCodecs.
>> All
>> > > >>> elapsed times are the average of 20 trials."
>> > > >>>
>> > > >>> But doesn't give any details of how this was implemented. Is the
>> > source
>> > > >>> code available somewhere? Micro-benchmarking in the JVM is pretty
>> > > tricky so
>> > > >>> it needs verification before numbers can be trusted. A performance
>> > test
>> > > >>> with kafka-producer-perf-test.sh would be nice to have as well, if
>> > > possible.
>> > > >>>
>> > > >>> Thanks,
>> > > >>> Ismael
>> > > >>>
>> > > >>> On Tue, Jan 10, 2017 at 7:44 AM, Dongjin Lee  wrote:
>> > > >>>
>> > > >>> > Ismael,
>> > > >>> >
>> > > >>> > 1. Is the benchmark in the KIP page not enough? You mean we
>> need a
>> > > whole
>> > > >>> > performance test using kafka-producer-perf-test.sh?
>> > > >>> >
>> > > >>> > 2. It seems like no major project is relying on it currently.
>> > > However,
>> > > >>> > after reviewing the code, I concluded that at least this project
>> > has
>> > > a good
>> > > >>> > test coverage. And for the problem of upstream tracking -
>> although
>> > > there is
>> > > >>> > no significant update on ZStandard to judge this problem, it
>> seems
>> > > not bad.
>> > > >>> > If required, I can take responsibility of the tracking for this
>> > > library.
>> > > >>> >
>> > > >>> > Thanks,
>> > > >>> > Dongjin
>> > > >>> >
>> > > >>> > On Tue, Jan 10, 2017 at 7:09 AM, Ismael Juma  wrote:
>> > > >>> >
>> > > >>> > > Thanks for posting the KIP, ZStandard looks like a nice
>> > > improvement over
>> > > >>> > > the existing compression algorithms. A couple of questions:
>> > > >>> > >
>> > > >>> > > 1. Can you please elaborate on the details of the benchmark?
>> > > >>> > > 2. About https://github.com/luben/zstd-jni, can we rely on
>> it? A
>> > > few
>> > > >>> > > things
>> > > >>> > > to consider: are there other projects using it, does it have
>> good
>> > > test
>> > > >>> > > coverage, are there performance tests, does it track upstream
>> > > closely?
>> > > >>> > >
>> > > >>> > > Thanks,
>> > > >>> > > Ismael
>> > > >>> > >
>> > > >>> > > On Fri, Jan 6, 2017 at 2:40 AM, Dongjin Lee  wrote:
>> > > >>> > >
>> > > >>> > > > Hi all,
>> > > >>> > > >
>> > > >>> > > > I've just posted a new KIP "KIP-110: Add Codec for ZStandard
>> > > >>> > Compression"
>> > > >>> > > > for
>> > > >>> > > > discussion:
>> > > >>> > > >
>> > > >>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > >>> > > > 110%3A+Add+Codec+for+ZStandard+Compression
>> > > >>> > > >
>> > > >>> > > > Please have a look when you are free.
>> > > >>> > > >
>> > > >>> > > > Best,
>> > > >>> > > > Dongjin
>> > > >>> > > >
>> > > >>> > > > --
>> > > >>> > > > *Dongjin Lee*
>> > > >>> > > >
>> > > >>> > > >
>> > > >>> > > > *Software developer in Line+.So interested in massive-scale
>> > > machine
>> > > >>> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
>> > > >>> > > > linkedin:
>> > > >>> > > > kr.linkedin.com/in/dongjinleekr
>> > > >>> > > > github:
>> > > >>> > > > github.com/dongjinleekr
>> > > >>> > > > twitter: www.twitter.com/dongjinleekr
>> > > >>> > > > *
>> > > >>> > > >
>> > > >>> > >
>> > > >>> >
>> > > >>> >
>> > > >>> >
>> > > >>> > --
>> > > >>> > *Dongjin Lee*
>> > > >>> >
>> > > >>> >
>> > > >>> > *Software developer in Line+.So interested in massive-scale
>> machine
>> > > >>> > learning.facebook: www.facebook.com/dongjin.lee.kr
>> > > >>> > linkedin:
>> > > >>> > kr.linkedin.com/in/dongjinleekr
>> > > >>> > github:
>> > > >>> > github.com/dongjinleekr
>> > > >>> > twitter: www.twitter.com/dongjinleekr
>> > > >>> > *
>> > > >>> >
>> > > >>>
>> > > >>>
>> > > >
>> > > >
>> > > > --
>> > > > *Dongjin Lee*
>> > > >
>> > > >
>> > > > *Software developer in Line+.So interested in massive-scale machine
>> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
>> > > > <http://www.facebook.com/dongjin.lee.kr>linkedin:
>> kr.linkedin.com/in/
>> > > dongjinleekr
>> > > > <http://kr.linkedin.com/in/dongjinleekr>github:
>> > > > <http://goog_969573159/>github.com/dongjinleekr
>> > > > <http://github.com/dongjinleekr>twitter:
>> www.twitter.com/dongjinleekr
>> > > > <http://www.twitter.com/dongjinleekr>*
>> > > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>>
>
>
>
> --
> *Dongjin Lee*
>
>
> *Software developer in Line+.So interested in massive-scale machine
> learning.facebook: www.facebook.com/dongjin.lee.kr
> <http://www.facebook.com/dongjin.lee.kr>linkedin: 
> kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>github:
> <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
> <http://www.twitter.com/dongjinleekr>*
>



-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
<http://www.facebook.com/dongjin.lee.kr>linkedin:
kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>github:
<http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
<http://www.twitter.com/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

Reply via email to