> On Oct 25, 2016, at 10:23 PM, Michael Pearce <michael.pea...@ig.com> wrote: > > Hi All, > > In case you hadn't noticed re the compaction issue for non-null values i have > created a separate KIP-87, if you could all contribute to its discussion > would be much appreciated. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-87+-+Add+Compaction+Tombstone+Flag > > Secondly, focussing back on KIP-82, one of the actions agreed from the KIP > call was for some additional alternative solution proposals on top of those > already detailed in the KIP wiki and subsequent linked wiki pages by others > in the group in the meeting. > > I haven't seen any activity on this, does this mean there isn't any further > and everyone in hindsight actually thinks the current proposed solution in > the KIP is the front runner? (i assume this isn't the case, just want to > nudge everyone) >
I have been meaning to respond, but I haven't had the time. In the next couple days, I will try to write up the container format that TiVo is using, and we can discuss it. -James > Also just copying across the kip call thread to keep everything in one thread > to avoid a divergence of the discussion into multiple threads. > > Cheers > Mike > > ________________________________________ > From: Mayuresh Gharat <gharatmayures...@gmail.com> > Sent: Monday, October 24, 2016 6:17 PM > To: dev@kafka.apache.org > Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST > > I agree with Nacho. > +1 for the KIP. > > Thanks, > > Mayuresh > > On Fri, Oct 21, 2016 at 11:46 AM, Nacho Solis <nso...@linkedin.com.invalid> > wrote: > >> I think a separate KIP is a good idea as well. Note however that potential >> decisions in this KIP could affect the other KIP. >> >> Nacho >> >> On Fri, Oct 21, 2016 at 10:23 AM, Jun Rao <j...@confluent.io> wrote: >> >>> Michael, >>> >>> Yes, doing a separate KIP to address the null payload issue for compacted >>> topics is a good idea. >>> >>> Thanks, >>> >>> Jun >>> >>> On Fri, Oct 21, 2016 at 12:57 AM, Michael Pearce <michael.pea...@ig.com> >>> wrote: >>> >>>> I had noted that what ever the solution having compaction based on null >>>> payload was agreed isn't elegant. >>>> >>>> Shall we raise another kip to : as discussed propose using a attribute >>> bit >>>> for delete/compaction flag as well/or instead of null value and >> updating >>>> compaction logic to look at that delelete/compaction attribute >>>> >>>> I believe this is less contentious, so that at least we get that done >>>> alleviating some concerns whilst the below gets discussed further? >>>> >>>> ________________________________________ >>>> From: Jun Rao <j...@confluent.io> >>>> Sent: Wednesday, October 19, 2016 8:56:52 PM >>>> To: dev@kafka.apache.org >>>> Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST >>>> >>>> The following are the notes from today's KIP discussion. >>>> >>>> >>>> - KIP-82 - add record header: We agreed that there are use cases for >>>> third-party vendors building tools around Kafka. We haven't reached >>> the >>>> conclusion whether the added complexity justifies the use cases. We >>> will >>>> follow up on the mailing list with use cases, container format >> people >>>> have >>>> been using, and details on the proposal. >>>> >>>> >>>> The video will be uploaded soon in https://cwiki.apache.org/ >>>> confluence/display/KAFKA/Kafka+Improvement+Proposals . >>>> >>>> Thanks, >>>> >>>> Jun >>>> >>>> On Mon, Oct 17, 2016 at 10:49 AM, Jun Rao <j...@confluent.io> wrote: >>>> >>>>> Hi, Everyone., >>>>> >>>>> We plan to have a Kafka KIP meeting this coming Wednesday at 11:00am >>> PST. >>>>> If you plan to attend but haven't received an invite, please let me >>> know. >>>>> The following is the tentative agenda. >>>>> >>>>> Agenda: >>>>> KIP-82: add record header >>>>> >>>>> Thanks, >>>>> >>>>> Jun >>>>> >>>> The information contained in this email is strictly confidential and >> for >>>> the use of the addressee only, unless otherwise indicated. If you are >> not >>>> the intended recipient, please do not read, copy, use or disclose to >>> others >>>> this message or any attachment. Please also notify the sender by >> replying >>>> to this email or by telephone (+44(020 7896 0011) and then delete the >>> email >>>> and any copies of it. Opinions, conclusion (etc) that do not relate to >>> the >>>> official business of this company shall be understood as neither given >>> nor >>>> endorsed by it. IG is a trading name of IG Markets Limited (a company >>>> registered in England and Wales, company number 04008957) and IG Index >>>> Limited (a company registered in England and Wales, company number >>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and >> IG >>>> Index Limited (register number 114059) are authorised and regulated by >>> the >>>> Financial Conduct Authority. >>>> >>> >> >> >> >> -- >> Nacho (Ignacio) Solis >> Kafka >> nso...@linkedin.com >> > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 > > > ________________________________________ > From: Michael Pearce <michael.pea...@ig.com> > Sent: Monday, October 17, 2016 7:48 AM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers > > Hi Jun, > > Sounds good. > > Look forward to the invite. > > Cheers, > Mike > ________________________________________ > From: Jun Rao <j...@confluent.io> > Sent: Monday, October 17, 2016 5:55:57 AM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers > > Hi, Michael, > > We do have online KIP discussion meetings from time to time. How about we > discuss this KIP Wed (Oct 19) at 11:00am PST? I will send out an invite (we > typically do the meeting through Zoom and will post the video recording to > Kafka wiki). > > Thanks, > > Jun > > On Wed, Oct 12, 2016 at 1:22 AM, Michael Pearce <michael.pea...@ig.com> > wrote: > >> @Jay and Dana >> >> We have internally had a few discussions of how we may address this if we >> had a common apache kafka message wrapper for headers that can be used >> client side only to, and address the compaction issue. >> I have detailed this solution separately and linked from the main KIP-82 >> wiki. >> >> Here’s a direct link – >> https://cwiki.apache.org/confluence/display/KAFKA/ >> Headers+Value+Message+Wrapper >> >> We feel this solution though doesn’t manage to address all the use cases >> being mentioned still and also has some compatibility drawbacks e.g. >> backwards forwards compatibility especially on different language clients >> Also we still require with this solution, as still need to address >> compaction issue / tombstones, we need to make server side changes and as >> many message/record version changes. >> >> We believe the proposed solution in KIP-82 does address all these needs >> and is cleaner still, and more benefits. >> Please have a read, and comment. Also if you have any improvements on the >> proposed KIP-82 or an alternative solution/option your input is appreciated. >> >> @All >> As Joel has mentioned to get this moving along, and able to discuss more >> fluidly, it would be great if we can organize to meet up virtually online >> e.g. webex or something. >> I am aware, that the majority are based in America, myself is in the UK. >> @Kostya I assume you’re in Eastern Europe or Russia based on your email >> address (please correct this assumption), I hope the time difference isn’t >> too much that the below would suit you if you wish to join >> >> Can I propose next Wednesday 19th October at 18:30 BST , 10:30 PST, 20:30 >> MSK we try meetup online? >> >> Would this date/time suit the majority? >> Also what is the preferred method? I can host via Adobe Connect style >> webex (which my company uses) but it isn’t the best IMHO, so more than >> happy to have someone suggest a better alternative. >> >> Best, >> Mike >> >> >> >> >> On 10/8/16, 7:26 AM, "Michael Pearce" <michael.pea...@ig.com> wrote: >> >>>> I agree with the critique of compaction not having a value. I think >> we should consider fixing that directly. >> >>> Agree that the compaction issue is troubling: compacted "null" >> deletes >> are incompatible w/ headers that must be packed into the message >> value. Are there any alternatives on compaction delete semantics that >> could address this? The KIP wiki discussion I think mostly assumes >> that compaction-delete is what it is and can't be changed/fixed. >> >> This KIP is about dealing with quite a few use cases and issues, >> please see both the KIP use cases detailed by myself and also the >> additional use cases wiki added by LinkedIn linked from the main KIP. >> >> The compaction is something that happily is addressed with headers, >> but most defiantly isn't the sole reason or use case for them, headers >> solves many issues and use cases. Thus their elegance and simplicity, and >> why they're so common in transport mechanisms and so succesfull, as stated >> like http, tcp, jms. >> >> ________________________________________ >> From: Dana Powers <dana.pow...@gmail.com> >> Sent: Friday, October 7, 2016 11:09 PM >> To: dev@kafka.apache.org >> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers >> >>> I agree with the critique of compaction not having a value. I think >> we should consider fixing that directly. >> >> Agree that the compaction issue is troubling: compacted "null" deletes >> are incompatible w/ headers that must be packed into the message >> value. Are there any alternatives on compaction delete semantics that >> could address this? The KIP wiki discussion I think mostly assumes >> that compaction-delete is what it is and can't be changed/fixed. >> >> -Dana >> >> On Fri, Oct 7, 2016 at 1:38 PM, Michael Pearce <michael.pea...@ig.com> >> wrote: >>> >>> Hi Jay, >>> >>> Thanks for the comments and feedback. >>> >>> I think its quite clear that if a problem keeps arising then it is >> clear that it needs resolving, and addressing properly. >>> >>> Fair enough at linkedIn, and historically for the very first use >> cases addressing this maybe not have been a big priority. But as Kafka is >> now Apache open source and being picked up by many including my company, it >> is clear and evident that this is a requirement and issue that needs to be >> now addressed to address these needs. >>> >>> The fact in almost every transport mechanism including networking >> layers in the enterprise ive worked in, there has always been headers i >> think clearly shows their need and success for a transport mechanism. >>> >>> I understand some concerns with regards to impact for others not >> needing it. >>> >>> What we are proposing is flexible solution that provides no overhead >> on storage or network traffic layers if you chose not to use headers, but >> does enable those who need or want it to use it. >>> >>> >>> On your response to 1), there is nothing saying that it should be >> put in any faster or without diligence and the same KIP process can still >> apply for adding kafka-scope headers, having headers, just makes it easier >> to add, without constant message and record changes. Timestamp is a clear >> real example of actually what should be in a header (along with other >> fields) but as such the whole message/record object needed to be changed to >> add this, as will any further headers deemed needed by kafka. >>> >>> On response to 2) why within my company as a platforms designer >> should i enforce that all teams use the same serialization for their >> payloads? But what i do need is some core cross cutting concerns and >> information addressed at my platform level and i don't want to impose onto >> my development teams. This is the same argument why byte[] is the exposed >> value and key because as a messaging platform you dont want to impose that >> on my company. >>> >>> On response to 3) Actually this isnt true, there are many 3rd party >> tools, we need to hook into our messaging flows that they only build onto >> standardised interfaces as obviously the cost to have a custom >> implementation for every company would be very high. >>> APM tooling is a clear case in point, every enterprise level APM >> tool on the market is able to stitch in transaction flow end 2 end over a >> platform over http, jms because they can stitch in some "magic" data in a >> uniform/standardised for the two mentioned they stitch this into the >> headers. It is current form they cannot do this with Kafka. Providing a >> standardised interface will i believe actually benefit the project as >> commercial companies like these will now be able to plugin their tooling >> uniformly, making it attractive and possible. >>> >>> Some of you other concerns as Joel mentions these are more >> implementation details, that i think should be agreed upon, but i think can >> be addressed. >>> >>> e.g. re your concern on the hashmap. >>> it is more than possible not to have every record have to have a >> hashmap unless it actually has a header (just like we have managed to do on >> the serialized meesage) so if theres a concern on the in memory record size >> for those using kafka without headers. >>> >>> On your second to last comment about every team choosing their own >> format, actually we do want this a little, as very first mentioned, no we >> don't want a free for all, but some freedom to have different serialization >> has different benefits and draw backs across our business. I can iterate >> these if needed. One of the use case for headers provided by linkedIn on >> top of my KIP even shows where headers could be beneficial here as a header >> could be used to detail which data format the message is serialized to >> allowing me to consume different formats. >>> >>> Also we have some systems that we need to integrate that pretty near >> impossible to wrap or touch their binary payloads, or we’re not allowed to >> touch them (historic system, or inter/intra corporate) >>> >>> Headers really gives as a solution to provide a pluggable platform, >> and standardisation that allows users to build platforms that adapt to >> their needs. >>> >>> >>> Cheers >>> Mike >>> >>> >>> ________________________________________ >>> From: Jay Kreps <j...@confluent.io> >>> Sent: Friday, October 7, 2016 4:45 PM >>> To: dev@kafka.apache.org >>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers >>> >>> Hey guys, >>> >>> This discussion has come up a number of times and we've always >> passed. >>> >>> One of things that has helped keep Kafka simple is not adding in new >>> abstractions and concepts except when the proposal is really elegant >> and >>> makes things simpler. >>> >>> Consider three use cases for headers: >>> >>> 1. Kafka-scope: We want to add a feature to Kafka that needs a >>> particular field. >>> 2. Company-scope: You want to add a header to be shared by >> everyone in >>> your company. >>> 3. World-wide scope: You are building a third party tool and want >> to add >>> some kind of header. >>> >>> For the case of (1) you should not use headers, you should just add >> a field >>> to the record format. Having a second way of encoding things doesn't >> make >>> sense. Occasionally people have complained that adding to the record >> format >>> is hard and it would be nice to just shove lots of things in >> quickly. I >>> think a better solution would be to make it easy to add to the record >>> format, and I think we've made progress on that. I also think we >> should be >>> insanely focused on the simplicity of the abstraction and not adding >> in new >>> thingies often---we thought about time for years before adding a >> timestamp >>> and I guarantee you we would have goofed it up if we'd gone with the >>> earlier proposals. These things end up being long term commitments >> so it's >>> really worth being thoughtful. >>> >>> For case (2) just use the body of the message. You don't need a >> globally >>> agreed on definition of headers, just standardize on a header you >> want to >>> include in the value in your company. Since this is just used by >> code in >>> your company having a more standard header format doesn't really >> help you. >>> In fact by using something like Avro you can define exactly the >> types you >>> want, the required header fields, etc. >>> >>> The only case that headers help is (3). This is a bit of a niche >> case and i >>> think is easily solved just making the reading and writing of given >>> required fields pluggable to work with the header you have. >>> >>> A couple of specific problems with this proposal: >>> >>> 1. A global registry of numeric keys is super super ugly. This >> seems >>> silly compared to the Avro (or whatever) header solution which >> gives more >>> compact encoding, rich types, etc. >>> 2. Using byte arrays for header values means they aren't really >>> interoperable for case (3). E.g. I can't make a UI that displays >> headers, >>> or allow you to set them in config. To work with third party >> headers, the >>> only case I think this really helps, you need the union of all >>> serialization schemes people have used for any tool. >>> 3. For case (2) and (3) your key numbers are going to collide like >>> crazy. I don't think a global registry of magic numbers >> maintained either >>> by word of mouth or checking in changes to kafka source is the >> right thing >>> to do. >>> 4. We are introducing a new serialization primitive which makes >> fields >>> disappear conditional on the contents of other fields. This >> breaks the >>> whole serialization/schema system we have today. >>> 5. We're adding a hashmap to each record >>> 6. This proposes making the ProducerRecord and ConsumerRecord >> mutable >>> and adding setters and getters (which we try to avoid). >>> >>> For context on LinkedIn: I set up the system there, but it may have >> changed >>> since i left. The header is maintained with the record schemas in >> the avro >>> schema registry and is required for all records. Essentially all >> messages >>> must have a field named "header" of type EventHeader which is itself >> a >>> record schema with a handful of fields (time, host, etc). The header >>> follows the same compatibility rules as other avro fields, so it can >> be >>> evolved in a compatible way gradually across apps. Avro is typed and >>> doesn't require deserializing the full record to read the header. The >>> header information is (timestamp, host, etc) is important and needs >> to >>> propagate into other systems like Hadoop which don't have a concept >> of >>> headers for records, so I doubt it could move out of the value in >> any case. >>> Not allowing teams to chose a data format other than avro was >> considered a >>> feature, not a bug, since the whole point was to be able to share >> data, >>> which doesn't work if every team chooses their own format. >>> >>> I agree with the critique of compaction not having a value. I think >> we >>> should consider fixing that directly. >>> >>> -Jay >>> >>> On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce < >> michael.pea...@ig.com> >>> wrote: >>> >>>> Hi All, >>>> >>>> >>>> I would like to discuss the following KIP proposal: >>>> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>> 82+-+Add+Record+Headers >>>> >>>> >>>> >>>> I have some initial ?drafts of roughly the changes that would be >> needed. >>>> This is no where finalized and look forward to the discussion >> especially as >>>> some bits I'm personally in two minds about. >>>> >>>> https://github.com/michaelandrepearce/kafka/tree/ >> kafka-headers-properties >>>> >>>> >>>> >>>> Here is a link to a alternative option mentioned in the kip but one >> i >>>> would personally would discard (disadvantages mentioned in kip) >>>> >>>> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full >> ? >>>> >>>> >>>> Thanks >>>> >>>> Mike >>>> >>>> >>>> >>>> >>>> >>>> The information contained in this email is strictly confidential >> and for >>>> the use of the addressee only, unless otherwise indicated. If you >> are not >>>> the intended recipient, please do not read, copy, use or disclose >> to others >>>> this message or any attachment. Please also notify the sender by >> replying >>>> to this email or by telephone (+44(020 7896 0011) and then delete >> the email >>>> and any copies of it. Opinions, conclusion (etc) that do not relate >> to the >>>> official business of this company shall be understood as neither >> given nor >>>> endorsed by it. IG is a trading name of IG Markets Limited (a >> company >>>> registered in England and Wales, company number 04008957) and IG >> Index >>>> Limited (a company registered in England and Wales, company number >>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate >> Hill, >>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) >> and IG >>>> Index Limited (register number 114059) are authorised and regulated >> by the >>>> Financial Conduct Authority. >>>> >>> The information contained in this email is strictly confidential and >> for the use of the addressee only, unless otherwise indicated. If you are >> not the intended recipient, please do not read, copy, use or disclose to >> others this message or any attachment. Please also notify the sender by >> replying to this email or by telephone (+44(020 7896 0011) and then delete >> the email and any copies of it. Opinions, conclusion (etc) that do not >> relate to the official business of this company shall be understood as >> neither given nor endorsed by it. IG is a trading name of IG Markets >> Limited (a company registered in England and Wales, company number >> 04008957) and IG Index Limited (a company registered in England and Wales, >> company number 01190902). Registered address at Cannon Bridge House, 25 >> Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number >> 195355) and IG Index Limited (register number 114059) are authorised and >> regulated by the Financial Conduct Authority. >> >> > The information contained in this email is strictly confidential and for the > use of the addressee only, unless otherwise indicated. If you are not the > intended recipient, please do not read, copy, use or disclose to others this > message or any attachment. Please also notify the sender by replying to this > email or by telephone (+44(020 7896 0011) and then delete the email and any > copies of it. Opinions, conclusion (etc) that do not relate to the official > business of this company shall be understood as neither given nor endorsed by > it. IG is a trading name of IG Markets Limited (a company registered in > England and Wales, company number 04008957) and IG Index Limited (a company > registered in England and Wales, company number 01190902). Registered address > at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets > Limited (register number 195355) and IG Index Limited (register number > 114059) are authorised and regulated by the Financial Conduct Authority.