I think attachments get filtered by the Apache mailing lists. You might need to post it elsewhere. I think you could upload it to the corresponding JIRA, for example.
-Ewen On Tue, Jan 31, 2017 at 3:06 AM, Dongjin Lee <dong...@apache.org> wrote: > No problem. I will elaborate my KIP with the updated benchmark & error > cases. > > And here, I resend you the screenshot: > > > > Best, > Dongjin > > On Tue, Jan 31, 2017 at 7:42 PM, Ismael Juma <ism...@juma.me.uk> wrote: > >> Thanks Dongjin. That seems to make things simpler indeed. It would be good >> to update the KIP with the relevant details. I suggest describing each >> flow >> and the error that the user would see in each case. >> >> Also, you mentioned an attached screenshot, but it seems like there was >> nothing attached to the email. >> >> Ismael >> >> On Tue, Jan 31, 2017 at 9:28 AM, Dongjin Lee <dong...@apache.org> wrote: >> >> > Ismael & All, >> > >> > After Inspecting the related code & commits, I concluded following: >> > >> > 1. We have to update the masking value which is used to retrieve the >> used >> > codec id from the messages, to enable the retrieval of the 3rd bit of >> > compression type field of the message. >> > 2. The above task is already done; so, we need nothing. >> > >> > Here is why. >> > >> > Let's start with the first one, with the scenario Juma proposed. In the >> > case of receiving the message of unsupported compression type, the >> receiver >> > (= broker or consumer) raises IllegalArgumentException[^1][^2]. The key >> > element in this operation is Record#COMPRESSION_CODEC_MASK, which is >> used >> > to extract the codec id. We have to update this value from 2-bits >> extractor >> > (=0x03) to 3-bits extractor (=0x07). >> > >> > But in fact, this task is already done, so its current value is 0x07. We >> > don't have to update it. >> > >> > The reason why this task is already done has some story; From the first >> > time Record.java file was added to the project[^3], the >> > COMPRESSION_CODEC_MASK was already 2-bits extractor, that is, 0x03. At >> that >> > time, Kafka supported only two compression types - GZipCompression and >> > SnappyCompression.[^4] After that, KAFKA-1456 introduced two additional >> > codecs of LZ4 and LZ4C[^5]. This update modified COMPRESSION_CODEC_MASK >> > into 3 bits extractor, 0x07, in the aim of supporting four compression >> > codecs. >> > >> > Although its following work, KAFKA-1493, removed the support of LZ4C >> > codec[^6], this mask was not reverted into 2-bits extractor - by this >> > reason, we don't need to care about the message format. >> > >> > Attached screenshot is my test on Juma's scenario. I created topic & >> sent >> > some messages using the snapshot version with ZStandard compression and >> > received the message with the latest version. As you can see, it works >> > perfectly as expected. >> > >> > If you have more opinion to this issue, don't hesitate to send me as a >> > message. >> > >> > Best, >> > Dongjin >> > >> > [^1]: see: Record#compressionType. >> > [^2]: There is similar routine in Message.scala. But after KAFKA-4390, >> > that routine is not being used anymore - more precisely, Message class >> is >> > now used in ConsoleConsumer only. I think this class should be replaced >> but >> > since it is a separated topic, I will send another message for this >> issue. >> > [^3]: commit 642da2f (2011.8.2). >> > [^4]: commit c51b940. >> > [^5]: commit 547cced. >> > [^6]: commit 37356bf. >> > >> > On Thu, Jan 26, 2017 at 12:35 AM, Ismael Juma <ism...@juma.me.uk> >> wrote: >> > >> >> So far the discussion was around the performance characteristics of the >> >> new >> >> compression algorithm. Another area that is important and is not >> covered >> >> in >> >> the KIP is the compatibility implications. For example, what happens >> if a >> >> consumer that doesn't support zstd tries to consume a topic compressed >> >> with >> >> it? Or if a broker that doesn't support receives data compressed with >> it? >> >> If we go through that exercise, then more changes may be required (like >> >> bumping the version of produce/fetch protocols). >> >> >> >> Ismael >> >> >> >> On Wed, Jan 25, 2017 at 3:22 PM, Ben Stopford <b...@confluent.io> >> wrote: >> >> >> >> > Is there more discussion to be had on this KIP, or should it be taken >> >> to a >> >> > vote? >> >> > >> >> > On Mon, Jan 16, 2017 at 6:37 AM Dongjin Lee <dong...@apache.org> >> wrote: >> >> > >> >> > > I updated KIP-110 with JMH-measured benchmark results. Please have >> a >> >> > review >> >> > > when you are free. (The overall result is not different yet.) >> >> > > >> >> > > Regards, >> >> > > Dongjin >> >> > > >> >> > > +1. Could anyone assign KAFKA-4514 to me? >> >> > > >> >> > > On Thu, Jan 12, 2017 at 11:39 AM, Dongjin Lee <dong...@apache.org> >> >> > wrote: >> >> > > >> >> > > > Okay, I will have a try. >> >> > > > Thanks Ewen for the guidance!! >> >> > > > >> >> > > > Best, >> >> > > > Dongjin >> >> > > > >> >> > > > On Thu, Jan 12, 2017 at 6:44 AM, Ismael Juma <ism...@juma.me.uk> >> >> > wrote: >> >> > > > >> >> > > >> That's a good point Ewen. Dongjin, you could use the branch that >> >> Ewen >> >> > > >> linked for the performance testing. It would also help validate >> the >> >> > PR. >> >> > > >> >> >> > > >> Ismael >> >> > > >> >> >> > > >> On Wed, Jan 11, 2017 at 9:38 PM, Ewen Cheslack-Postava < >> >> > > e...@confluent.io >> >> > > >> > >> >> > > >> wrote: >> >> > > >> >> >> > > >> > FYI, there's an outstanding patch for getting some JMH >> >> benchmarking >> >> > > >> setup: >> >> > > >> > https://github.com/apache/kafka/pull/1712 I haven't found >> time >> >> to >> >> > > >> review >> >> > > >> > it >> >> > > >> > (and don't really know JMH well anyway) but it might be worth >> >> > getting >> >> > > >> that >> >> > > >> > landed so we can use it for this as well. >> >> > > >> > >> >> > > >> > -Ewen >> >> > > >> > >> >> > > >> > On Wed, Jan 11, 2017 at 6:35 AM, Dongjin Lee < >> dong...@apache.org >> >> > >> >> > > >> wrote: >> >> > > >> > >> >> > > >> > > Hi Ismael, >> >> > > >> > > >> >> > > >> > > 1. In the case of compression output, yes, lz4 is producing >> the >> >> > > >> smaller >> >> > > >> > > output than gzip. In fact, my benchmark was inspired >> >> > > >> > > by MessageCompressionTest#testCompressSize unit test and >> the >> >> > result >> >> > > >> is >> >> > > >> > > same - 396 bytes for gzip and 387 bytes for lz4. >> >> > > >> > > 2. I agree that my (former) approach can result in >> unreliable >> >> > > output. >> >> > > >> > > However, I am experiencing difficulties on how to acquire >> the >> >> > > >> benchmark >> >> > > >> > > metrics from Kafka. For you recommended JMH, I just started >> to >> >> > > google >> >> > > >> for >> >> > > >> > > it. If possible, could you give any example on how to use >> JMH >> >> > > against >> >> > > >> > > Kafka? If it is the case, it will be a great help. >> >> > > >> > > Regards,Dongjin >> >> > > >> > > >> >> > > >> > > _____________________________ >> >> > > >> > > From: Ismael Juma <ism...@juma.me.uk> >> >> > > >> > > Sent: Wednesday, January 11, 2017 7:33 PM >> >> > > >> > > Subject: Re: [DISCUSS] KIP-110: Add Codec for ZStandard >> >> > Compression >> >> > > >> > > To: <dev@kafka.apache.org> >> >> > > >> > > >> >> > > >> > > >> >> > > >> > > Thanks Dongjin. I highly recommend using JMH for the >> benchmark, >> >> > the >> >> > > >> > > existing one has a few problems that could result in >> unreliable >> >> > > >> results. >> >> > > >> > > Also, it's a bit surprising that LZ4 is producing smaller >> >> output >> >> > > than >> >> > > >> > gzip. >> >> > > >> > > Is that right? >> >> > > >> > > >> >> > > >> > > Ismael >> >> > > >> > > >> >> > > >> > > On Wed, Jan 11, 2017 at 10:20 AM, Dongjin Lee < >> >> dong...@apache.org >> >> > > >> >> > > >> > wrote: >> >> > > >> > > >> >> > > >> > > > Ismael, >> >> > > >> > > > >> >> > > >> > > > I pushed the benchmark code I used, with some updates >> >> > (iteration: >> >> > > >> 20 -> >> >> > > >> > > > 1000). I also updated the KIP page with the updated >> benchmark >> >> > > >> results. >> >> > > >> > > > Please take a review when you are free. The attached >> >> screenshot >> >> > > >> shows >> >> > > >> > how >> >> > > >> > > > to run the benchmarker. >> >> > > >> > > > >> >> > > >> > > > Thanks, >> >> > > >> > > > Dongjin >> >> > > >> > > > >> >> > > >> > > > On Tue, Jan 10, 2017 at 8:03 PM, Dongjin Lee < >> >> > dong...@apache.org> >> >> > > >> > wrote: >> >> > > >> > > > >> >> > > >> > > >> Ismael, >> >> > > >> > > >> >> >> > > >> > > >> I see. Then, I will share the benchmark code I used by >> >> > tomorrow. >> >> > > >> > Thanks >> >> > > >> > > >> for your guidance. >> >> > > >> > > >> >> >> > > >> > > >> Best, >> >> > > >> > > >> Dongjin >> >> > > >> > > >> >> >> > > >> > > >> ----- >> >> > > >> > > >> >> >> > > >> > > >> Dongjin Lee >> >> > > >> > > >> >> >> > > >> > > >> Software developer in Line+. >> >> > > >> > > >> So interested in massive-scale machine learning. >> >> > > >> > > >> >> >> > > >> > > >> facebook: www.facebook.com/dongjin.lee.kr >> >> > > >> > > >> linkedin: kr.linkedin.com/in/dongjinleekr >> >> > > >> > > >> github: github.com/dongjinleekr >> >> > > >> > > >> twitter: www.twitter.com/dongjinleekr >> >> > > >> > > >> >> >> > > >> > > >> >> >> > > >> > > >> >> >> > > >> > > >> >> >> > > >> > > >> On Tue, Jan 10, 2017 at 7:24 PM +0900, "Ismael Juma" < >> >> > > >> > ism...@juma.me.uk >> >> > > >> > > > >> >> > > >> > > >> wrote: >> >> > > >> > > >> >> >> > > >> > > >> Dongjin, >> >> > > >> > > >>> >> >> > > >> > > >>> The KIP states: >> >> > > >> > > >>> >> >> > > >> > > >>> "I compared the compressed size and compression time of >> 3 >> >> > > >> 1kb-sized >> >> > > >> > > >>> messages (3102 bytes in total), with the >> >> Draft-implementation >> >> > of >> >> > > >> > > ZStandard >> >> > > >> > > >>> Compression Codec and all currently available >> >> > CompressionCodecs. >> >> > > >> All >> >> > > >> > > >>> elapsed times are the average of 20 trials." >> >> > > >> > > >>> >> >> > > >> > > >>> But doesn't give any details of how this was >> implemented. >> >> Is >> >> > the >> >> > > >> > source >> >> > > >> > > >>> code available somewhere? Micro-benchmarking in the JVM >> is >> >> > > pretty >> >> > > >> > > tricky so >> >> > > >> > > >>> it needs verification before numbers can be trusted. A >> >> > > performance >> >> > > >> > test >> >> > > >> > > >>> with kafka-producer-perf-test.sh would be nice to have >> as >> >> > well, >> >> > > if >> >> > > >> > > possible. >> >> > > >> > > >>> >> >> > > >> > > >>> Thanks, >> >> > > >> > > >>> Ismael >> >> > > >> > > >>> >> >> > > >> > > >>> On Tue, Jan 10, 2017 at 7:44 AM, Dongjin Lee wrote: >> >> > > >> > > >>> >> >> > > >> > > >>> > Ismael, >> >> > > >> > > >>> > >> >> > > >> > > >>> > 1. Is the benchmark in the KIP page not enough? You >> mean >> >> we >> >> > > >> need a >> >> > > >> > > whole >> >> > > >> > > >>> > performance test using kafka-producer-perf-test.sh? >> >> > > >> > > >>> > >> >> > > >> > > >>> > 2. It seems like no major project is relying on it >> >> > currently. >> >> > > >> > > However, >> >> > > >> > > >>> > after reviewing the code, I concluded that at least >> this >> >> > > project >> >> > > >> > has >> >> > > >> > > a good >> >> > > >> > > >>> > test coverage. And for the problem of upstream >> tracking - >> >> > > >> although >> >> > > >> > > there is >> >> > > >> > > >>> > no significant update on ZStandard to judge this >> >> problem, it >> >> > > >> seems >> >> > > >> > > not bad. >> >> > > >> > > >>> > If required, I can take responsibility of the tracking >> >> for >> >> > > this >> >> > > >> > > library. >> >> > > >> > > >>> > >> >> > > >> > > >>> > Thanks, >> >> > > >> > > >>> > Dongjin >> >> > > >> > > >>> > >> >> > > >> > > >>> > On Tue, Jan 10, 2017 at 7:09 AM, Ismael Juma wrote: >> >> > > >> > > >>> > >> >> > > >> > > >>> > > Thanks for posting the KIP, ZStandard looks like a >> nice >> >> > > >> > > improvement over >> >> > > >> > > >>> > > the existing compression algorithms. A couple of >> >> > questions: >> >> > > >> > > >>> > > >> >> > > >> > > >>> > > 1. Can you please elaborate on the details of the >> >> > benchmark? >> >> > > >> > > >>> > > 2. About https://github.com/luben/zstd-jni, can we >> >> rely >> >> > on >> >> > > >> it? A >> >> > > >> > > few >> >> > > >> > > >>> > > things >> >> > > >> > > >>> > > to consider: are there other projects using it, >> does it >> >> > have >> >> > > >> good >> >> > > >> > > test >> >> > > >> > > >>> > > coverage, are there performance tests, does it track >> >> > > upstream >> >> > > >> > > closely? >> >> > > >> > > >>> > > >> >> > > >> > > >>> > > Thanks, >> >> > > >> > > >>> > > Ismael >> >> > > >> > > >>> > > >> >> > > >> > > >>> > > On Fri, Jan 6, 2017 at 2:40 AM, Dongjin Lee wrote: >> >> > > >> > > >>> > > >> >> > > >> > > >>> > > > Hi all, >> >> > > >> > > >>> > > > >> >> > > >> > > >>> > > > I've just posted a new KIP "KIP-110: Add Codec for >> >> > > ZStandard >> >> > > >> > > >>> > Compression" >> >> > > >> > > >>> > > > for >> >> > > >> > > >>> > > > discussion: >> >> > > >> > > >>> > > > >> >> > > >> > > >>> > > > https://cwiki.apache.org/confl >> >> uence/display/KAFKA/KIP- >> >> > > >> > > >>> > > > 110%3A+Add+Codec+for+ZStandard+Compression >> >> > > >> > > >>> > > > >> >> > > >> > > >>> > > > Please have a look when you are free. >> >> > > >> > > >>> > > > >> >> > > >> > > >>> > > > Best, >> >> > > >> > > >>> > > > Dongjin >> >> > > >> > > >>> > > > >> >> > > >> > > >>> > > > -- >> >> > > >> > > >>> > > > *Dongjin Lee* >> >> > > >> > > >>> > > > >> >> > > >> > > >>> > > > >> >> > > >> > > >>> > > > *Software developer in Line+.So interested in >> >> > > massive-scale >> >> > > >> > > machine >> >> > > >> > > >>> > > > learning.facebook: www.facebook.com/dongjin.lee.k >> r >> >> > > >> > > >>> > > > linkedin: >> >> > > >> > > >>> > > > kr.linkedin.com/in/dongjinleekr >> >> > > >> > > >>> > > > github: >> >> > > >> > > >>> > > > github.com/dongjinleekr >> >> > > >> > > >>> > > > twitter: www.twitter.com/dongjinleekr >> >> > > >> > > >>> > > > * >> >> > > >> > > >>> > > > >> >> > > >> > > >>> > > >> >> > > >> > > >>> > >> >> > > >> > > >>> > >> >> > > >> > > >>> > >> >> > > >> > > >>> > -- >> >> > > >> > > >>> > *Dongjin Lee* >> >> > > >> > > >>> > >> >> > > >> > > >>> > >> >> > > >> > > >>> > *Software developer in Line+.So interested in >> >> massive-scale >> >> > > >> machine >> >> > > >> > > >>> > learning.facebook: www.facebook.com/dongjin.lee.kr >> >> > > >> > > >>> > linkedin: >> >> > > >> > > >>> > kr.linkedin.com/in/dongjinleekr >> >> > > >> > > >>> > github: >> >> > > >> > > >>> > github.com/dongjinleekr >> >> > > >> > > >>> > twitter: www.twitter.com/dongjinleekr >> >> > > >> > > >>> > * >> >> > > >> > > >>> > >> >> > > >> > > >>> >> >> > > >> > > >>> >> >> > > >> > > > >> >> > > >> > > > >> >> > > >> > > > -- >> >> > > >> > > > *Dongjin Lee* >> >> > > >> > > > >> >> > > >> > > > >> >> > > >> > > > *Software developer in Line+.So interested in >> massive-scale >> >> > > machine >> >> > > >> > > > learning.facebook: www.facebook.com/dongjin.lee.kr >> >> > > >> > > > <http://www.facebook.com/dongjin.lee.kr>linkedin: >> >> > > >> kr.linkedin.com/in/ >> >> > > >> > > dongjinleekr >> >> > > >> > > > <http://kr.linkedin.com/in/dongjinleekr>github: >> >> > > >> > > > <http://goog_969573159/>github.com/dongjinleekr >> >> > > >> > > > <http://github.com/dongjinleekr>twitter: >> >> > > >> www.twitter.com/dongjinleekr >> >> > > >> > > > <http://www.twitter.com/dongjinleekr>* >> >> > > >> > > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> > >> >> > > >> >> >> > > > >> >> > > > >> >> > > > >> >> > > > -- >> >> > > > *Dongjin Lee* >> >> > > > >> >> > > > >> >> > > > *Software developer in Line+.So interested in massive-scale >> machine >> >> > > > learning.facebook: www.facebook.com/dongjin.lee.kr >> >> > > > <http://www.facebook.com/dongjin.lee.kr>linkedin: >> >> > > kr.linkedin.com/in/dongjinleekr >> >> > > > <http://kr.linkedin.com/in/dongjinleekr>github: >> >> > > > <http://goog_969573159/>github.com/dongjinleekr >> >> > > > <http://github.com/dongjinleekr>twitter: >> >> www.twitter.com/dongjinleekr >> >> > > > <http://www.twitter.com/dongjinleekr>* >> >> > > > >> >> > > >> >> > > >> >> > > >> >> > > -- >> >> > > *Dongjin Lee* >> >> > > >> >> > > >> >> > > *Software developer in Line+.So interested in massive-scale machine >> >> > > learning.facebook: www.facebook.com/dongjin.lee.kr >> >> > > <http://www.facebook.com/dongjin.lee.kr>linkedin: >> >> > > kr.linkedin.com/in/dongjinleekr >> >> > > <http://kr.linkedin.com/in/dongjinleekr>github: >> >> > > <http://goog_969573159/>github.com/dongjinleekr >> >> > > <http://github.com/dongjinleekr>twitter: >> www.twitter.com/dongjinleekr >> >> > > <http://www.twitter.com/dongjinleekr>* >> >> > > >> >> > >> >> >> > >> > >> > >> > -- >> > *Dongjin Lee* >> > >> > >> > *Software developer in Line+.So interested in massive-scale machine >> > learning.facebook: www.facebook.com/dongjin.lee.kr >> > <http://www.facebook.com/dongjin.lee.kr>linkedin: >> kr.linkedin.com/in/dongjinleekr >> > <http://kr.linkedin.com/in/dongjinleekr>github: >> > <http://goog_969573159/>github.com/dongjinleekr >> > <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr >> > <http://www.twitter.com/dongjinleekr>* >> > >> > > > > -- > *Dongjin Lee* > > > *Software developer in Line+.So interested in massive-scale machine > learning.facebook: www.facebook.com/dongjin.lee.kr > <http://www.facebook.com/dongjin.lee.kr>linkedin: > kr.linkedin.com/in/dongjinleekr > <http://kr.linkedin.com/in/dongjinleekr>github: > <http://goog_969573159/>github.com/dongjinleekr > <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr > <http://www.twitter.com/dongjinleekr>* >