Re: [DISCUSS] KIP-82 - Add Record Headers

Michael Pearce Wed, 12 Oct 2016 01:23:17 -0700

@Jay and Dana

We have internally had a few discussions of how we may address this if we had a 
common apache kafka message wrapper for headers that can be used client side 
only to, and address the compaction issue. 
I have detailed this solution separately and linked from the main KIP-82 wiki.


Here’s a direct link – 
https://cwiki.apache.org/confluence/display/KAFKA/Headers+Value+Message+Wrapper

We feel this solution though doesn’t manage to address all the use cases being 
mentioned still and also has some compatibility drawbacks e.g. backwards 
forwards compatibility especially on different language clients
Also we still require with this solution, as still need to address compaction 
issue / tombstones, we need to make server side changes and as many 
message/record version changes.

We believe the proposed solution in KIP-82 does address all these needs and is 
cleaner still, and more benefits.
Please have a read, and comment. Also if you have any improvements on the 
proposed KIP-82 or an alternative solution/option your input is appreciated.

@All
As Joel has mentioned to get this moving along, and able to discuss more 
fluidly, it would be great if we can organize to meet up virtually online e.g. 
webex or something.
I am aware, that the majority are based in America, myself is in the UK. 
@Kostya I assume you’re in Eastern Europe or Russia based on your email address 
(please correct this assumption), I hope the time difference isn’t too much 
that the below would suit you if you wish to join

Can I propose next Wednesday 19th October at 18:30 BST , 10:30 PST, 20:30 MSK 
we try meetup online?

Would this date/time suit the majority?
Also what is the preferred method? I can host via Adobe Connect style webex 
(which my company uses) but it isn’t the best IMHO, so more than happy to have 
someone suggest a better alternative. 

Best,
Mike




On 10/8/16, 7:26 AM, "Michael Pearce" <michael.pea...@ig.com> wrote:

    >> I agree with the critique of compaction not having a value. I think we 
should consider fixing that directly.
    
    > Agree that the compaction issue is troubling: compacted "null" deletes
    are incompatible w/ headers that must be packed into the message
    value. Are there any alternatives on compaction delete semantics that
    could address this? The KIP wiki discussion I think mostly assumes
    that compaction-delete is what it is and can't be changed/fixed.
    
    This KIP is about dealing with quite a few use cases and issues, please see 
both the KIP use cases detailed by myself and also the additional use cases 
wiki added by LinkedIn linked from the main KIP.
    
    The compaction is something that happily is addressed with headers, but 
most defiantly isn't the sole reason or use case for them, headers solves many 
issues and use cases. Thus their elegance and simplicity, and why they're so 
common in transport mechanisms and so succesfull, as stated like http, tcp, jms.
    
    ________________________________________
    From: Dana Powers <dana.pow...@gmail.com>
    Sent: Friday, October 7, 2016 11:09 PM
    To: dev@kafka.apache.org
    Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
    
    > I agree with the critique of compaction not having a value. I think we 
should consider fixing that directly.
    
    Agree that the compaction issue is troubling: compacted "null" deletes
    are incompatible w/ headers that must be packed into the message
    value. Are there any alternatives on compaction delete semantics that
    could address this? The KIP wiki discussion I think mostly assumes
    that compaction-delete is what it is and can't be changed/fixed.
    
    -Dana
    
    On Fri, Oct 7, 2016 at 1:38 PM, Michael Pearce <michael.pea...@ig.com> 
wrote:
    >
    > Hi Jay,
    >
    > Thanks for the comments and feedback.
    >
    > I think its quite clear that if a problem keeps arising then it is clear 
that it needs resolving, and addressing properly.
    >
    > Fair enough at linkedIn, and historically for the very first use cases 
addressing this maybe not have been a big priority. But as Kafka is now Apache 
open source and being picked up by many including my company, it is clear and 
evident that this is a requirement and issue that needs to be now addressed to 
address these needs.
    >
    > The fact in almost every transport mechanism including networking layers 
in the enterprise ive worked in, there has always been headers i think clearly 
shows their need and success for a transport mechanism.
    >
    > I understand some concerns with regards to impact for others not needing 
it.
    >
    > What we are proposing is flexible solution that provides no overhead on 
storage or network traffic layers if you chose not to use headers, but does 
enable those who need or want it to use it.
    >
    >
    > On your response to 1), there is nothing saying that it should be put in 
any faster or without diligence and the same KIP process can still apply for 
adding kafka-scope headers, having headers, just makes it easier to add, 
without constant message and record changes. Timestamp is a clear real example 
of actually what should be in a header (along with other fields) but as such 
the whole message/record object needed to be changed to add this, as will any 
further headers deemed needed by kafka.
    >
    > On response to 2) why within my company as a platforms designer should i 
enforce that all teams use the same serialization for their payloads? But what 
i do need is some core cross cutting concerns and information addressed at my 
platform level and i don't want to impose onto my development teams. This is 
the same argument why byte[] is the exposed value and key because as a 
messaging platform you dont want to impose that on my company.
    >
    > On response to 3) Actually this isnt true, there are many 3rd party 
tools, we need to hook into our messaging flows that they only build onto 
standardised interfaces as obviously the cost to have a custom implementation 
for every company would be very high.
    > APM tooling is a clear case in point, every enterprise level APM tool on 
the market is able to stitch in transaction flow end 2 end over a platform over 
http, jms because they can stitch in some "magic" data in a 
uniform/standardised for the two mentioned they stitch this into the headers. 
It is current form they cannot do this with Kafka. Providing a standardised 
interface will i believe actually benefit the project as commercial companies 
like these will now be able to plugin their tooling uniformly, making it 
attractive and possible.
    >
    > Some of you other concerns as Joel mentions these are more implementation 
details, that i think should be agreed upon, but i think can be addressed.
    >
    > e.g. re your concern on the hashmap.
    > it is more than possible not to have every record have to have a hashmap 
unless it actually has a header (just like we have managed to do on the 
serialized meesage) so if theres a concern on the in memory record size for 
those using kafka without headers.
    >
    > On your second to last comment about every team choosing their own 
format, actually we do want this a little, as very first mentioned, no we don't 
want a free for all, but some freedom to have different serialization has 
different benefits and draw backs across our business. I can iterate these if 
needed. One of the use case for headers provided by linkedIn on top of my KIP 
even shows where headers could be beneficial here as a header could be used to 
detail which data format the message is serialized to allowing me to consume 
different formats.
    >
    > Also we have some systems that we need to integrate that pretty near 
impossible to wrap or touch their binary payloads, or we’re not allowed to 
touch them (historic system, or inter/intra corporate)
    >
    > Headers really gives as a solution to provide a pluggable platform, and 
standardisation that allows users to build platforms that adapt to their needs.
    >
    >
    > Cheers
    > Mike
    >
    >
    > ________________________________________
    > From: Jay Kreps <j...@confluent.io>
    > Sent: Friday, October 7, 2016 4:45 PM
    > To: dev@kafka.apache.org
    > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
    >
    > Hey guys,
    >
    > This discussion has come up a number of times and we've always passed.
    >
    > One of things that has helped keep Kafka simple is not adding in new
    > abstractions and concepts except when the proposal is really elegant and
    > makes things simpler.
    >
    > Consider three use cases for headers:
    >
    >    1. Kafka-scope: We want to add a feature to Kafka that needs a
    >    particular field.
    >    2. Company-scope: You want to add a header to be shared by everyone in
    >    your company.
    >    3. World-wide scope: You are building a third party tool and want to 
add
    >    some kind of header.
    >
    > For the case of (1) you should not use headers, you should just add a 
field
    > to the record format. Having a second way of encoding things doesn't make
    > sense. Occasionally people have complained that adding to the record 
format
    > is hard and it would be nice to just shove lots of things in quickly. I
    > think a better solution would be to make it easy to add to the record
    > format, and I think we've made progress on that. I also think we should be
    > insanely focused on the simplicity of the abstraction and not adding in 
new
    > thingies often---we thought about time for years before adding a timestamp
    > and I guarantee you we would have goofed it up if we'd gone with the
    > earlier proposals. These things end up being long term commitments so it's
    > really worth being thoughtful.
    >
    > For case (2) just use the body of the message. You don't need a globally
    > agreed on definition of headers, just standardize on a header you want to
    > include in the value in your company. Since this is just used by code in
    > your company having a more standard header format doesn't really help you.
    > In fact by using something like Avro you can define exactly the types you
    > want, the required header fields, etc.
    >
    > The only case that headers help is (3). This is a bit of a niche case and 
i
    > think is easily solved just making the reading and writing of given
    > required fields pluggable to work with the header you have.
    >
    > A couple of specific problems with this proposal:
    >
    >    1. A global registry of numeric keys is super super ugly. This seems
    >    silly compared to the Avro (or whatever) header solution which gives 
more
    >    compact encoding, rich types, etc.
    >    2. Using byte arrays for header values means they aren't really
    >    interoperable for case (3). E.g. I can't make a UI that displays 
headers,
    >    or allow you to set them in config. To work with third party headers, 
the
    >    only case I think this really helps, you need the union of all
    >    serialization schemes people have used for any tool.
    >    3. For case (2) and (3) your key numbers are going to collide like
    >    crazy. I don't think a global registry of magic numbers maintained 
either
    >    by word of mouth or checking in changes to kafka source is the right 
thing
    >    to do.
    >    4. We are introducing a new serialization primitive which makes fields
    >    disappear conditional on the contents of other fields. This breaks the
    >    whole serialization/schema system we have today.
    >    5. We're adding a hashmap to each record
    >    6. This proposes making the ProducerRecord and ConsumerRecord mutable
    >    and adding setters and getters (which we try to avoid).
    >
    > For context on LinkedIn: I set up the system there, but it may have 
changed
    > since i left. The header is maintained with the record schemas in the avro
    > schema registry and is required for all records. Essentially all messages
    > must have a field named "header" of type EventHeader which is itself a
    > record schema with a handful of fields (time, host, etc). The header
    > follows the same compatibility rules as other avro fields, so it can be
    > evolved in a compatible way gradually across apps. Avro is typed and
    > doesn't require deserializing the full record to read the header. The
    > header information is (timestamp, host, etc) is important and needs to
    > propagate into other systems like Hadoop which don't have a concept of
    > headers for records, so I doubt it could move out of the value in any 
case.
    > Not allowing teams to chose a data format other than avro was considered a
    > feature, not a bug, since the whole point was to be able to share data,
    > which doesn't work if every team chooses their own format.
    >
    > I agree with the critique of compaction not having a value. I think we
    > should consider fixing that directly.
    >
    > -Jay
    >
    > On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <michael.pea...@ig.com>
    > wrote:
    >
    >> Hi All,
    >>
    >>
    >> I would like to discuss the following KIP proposal:
    >>
    >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
    >> 82+-+Add+Record+Headers
    >>
    >>
    >>
    >> I have some initial ?drafts of roughly the changes that would be needed.
    >> This is no where finalized and look forward to the discussion especially 
as
    >> some bits I'm personally in two minds about.
    >>
    >> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-properties
    >>
    >>
    >>
    >> Here is a link to a alternative option mentioned in the kip but one i
    >> would personally would discard (disadvantages mentioned in kip)
    >>
    >> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full?
    >>
    >>
    >> Thanks
    >>
    >> Mike
    >>
    >>
    >>
    >>
    >>
    >> The information contained in this email is strictly confidential and for
    >> the use of the addressee only, unless otherwise indicated. If you are not
    >> the intended recipient, please do not read, copy, use or disclose to 
others
    >> this message or any attachment. Please also notify the sender by replying
    >> to this email or by telephone (+44(020 7896 0011) and then delete the 
email
    >> and any copies of it. Opinions, conclusion (etc) that do not relate to 
the
    >> official business of this company shall be understood as neither given 
nor
    >> endorsed by it. IG is a trading name of IG Markets Limited (a company
    >> registered in England and Wales, company number 04008957) and IG Index
    >> Limited (a company registered in England and Wales, company number
    >> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
    >> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
    >> Index Limited (register number 114059) are authorised and regulated by 
the
    >> Financial Conduct Authority.
    >>
    > The information contained in this email is strictly confidential and for 
the use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to