Re: [DISCUSS] KIP-82 - Add Record Headers

James Cheng Sat, 29 Oct 2016 00:10:50 -0700

Let me talk about the container format that we are using here at TiVo to add 
headers to our Kafka messages.


Just some quick terminology, so that I don't confuse everyone.
I'm going to use "message body" to refer to the thing returned by 
ConsumerRecord.value()
And I'm going to use "payload" to refer to your data after it has been 
serialized into bytes.

To recap, during the KIP call, we talked about 3 ways to have headers in Kafka 
messages:
1) The message body is your payload, which has headers within it.
2) The message body is a container, which has headers in it as well your 
payload.
3) Extend Kafka to hold headers outside of the message body. The message body 
holds your payload.

1) The message body is your payload, which has headers in it
-----------------------
Here's an example of what this may look like, if it were rendered in JSON:

{
    "headers" : {
        "Host" : "host.domain.com",
        "Service" : "PaymentProcessor",
        "Timestamp" : "2016-10-28 12:45:56"
    },
    "Field1" : "value",
    "Field2" : "value"
}

In this scenario, headers are really not anything special. They are a part of 
your payload. They may have been auto-included by some mechanism in all of your 
schemas, but they really just are part of your payload. I believe LinkedIn uses 
this mechanism. The "headers" field is a reserved word in all schemas, and is 
somehow auto-inserted into all schemas. The headers schema contains a couple 
fields like "host" and "service" and "timestamp". If LinkedIn decides that a 
new field needs to be added for company-wide infrastructure purposes, then they 
will add it to the schema of "headers", and because "headers" is included 
everywhere, then all schemas will get updated as well.

Because they are simply part of your payload, you need to deserialize your 
payload in order to read the headers.

3) Extend Kafka to hold headers outside of the message body. The message body 
holds your payload.
-------------
This is what this KIP is discussing. I will let others talk about this.

2) The message body is a container, which has headers in it, as well as your 
payload.
--------------
At TiVo, we have standardized on a container format that looks very similar to 
HTTP. Let me jump straight to an example:

----- example below ----
JS/1 123 1024
Host: host.domain.com
Service: SomethingProcessor
Timestamp: 2016-10-28 12:45:56
ObjectTypeInPayload: MyObjectV1

{
    "Field1" : "value",
    "Field2" : "value"
}
----- example above ----

Ignore the first line for now. Lines 2-5 are headers. Then there is a blank 
line, and then after that is your payload.  The field "ObjectTypeInPayload" 
describes what schema applies to the payload. In order to decode your payload, 
you read the field "ObjectTypeInPayload" and use that to decide how to decode 
the payload.

Okay, let's talk about details.
The first line is "JS/1 123 1024". 
* The JS/1 thing is the "schema" of the container. JS is the container type, 1 
is the version number of the JS container. This particular version of the JS 
container means those 4 specific headers are present, and that the payload is 
encoded in JSON.
* The 123 is the length in bytes of the header section. (This particular 
example probably isn't exactly 123 bytes)
* The 1024 is the length in bytes of the payload. (This particular example 
probably isn't exactly 1024 bytes)
The 123 and 1024 allow the deserializer to quickly jump to the different 
sections. I don't know how necessary they are. It's possible they are an over 
optimization. They are kind of a holdover from a previous wireformat here at 
TiVo where we were pipelining messages over TCP as one continuous bytestream  
(NOT using Kafka), and we needed to be able to know where one object ended and 
another started, and also be able to skip messages that we didn't care about.

Let me show another made up example of this container format being used:

---- example below ----
AV/1 123 1024
Host: host.domain.com
Service: SomethingProcessor
Timestamp: 2016-10-28 12:45:56

0xFF
BYTESOFDATA
---- example above ----

This container is of type AV/1. This means that the payload is a magic byte 
followed by a stream of bytes. The magic byte is schema registry ID which is 
used to look up the schema, which is then used to decode the rest of the bytes 
in the payload.

Notice that this is a different use of the same container syntax. In this case, 
the schema ID was a byte in the payload. In the JS/1 case, the schema ID was 
stored in a header.

Here is a more precise description of the container format:
---- container format below ----
<tag><headers length><payload length>\r\n
header: value\r\n
header: value\r\n
\r\n
payload
---- container format above ----

As I mentioned above, the headers length and payload length might not be 
necessary. You can also simply scan the message body until the first occurence 
of \r\n\r\n

Let's talk about pros/cons.

Pros:
* Headers do not affect the payload. An addition of a header does not effect 
the schema of the payload.
* Payload serialization can be different for different use cases. This 
container format can carry a payload that is Avro or JSON or Thrift or 
whatever. The payload is just a stream of bytes.
* Headers can be read without deserializing the payload
* Headers can have a schema. In the JS/1 case, I use "JS/1" to mean that "There 
are 4 required fields. Host is a string, Service is a string, Timestamp is a 
time in ISO(something) format, ObjectTypeInPayload is a String, and the payload 
is in JSON"
* Plaintext headers with a relatively simple syntax is pretty easy to parse in 
any programming language.

Cons:
* Double serialization upon writes. In order to create the message body, you 
first have to create your payload (which means you serialize your object into 
an array of bytes) and then tack headers onto the front of it. And if you do 
the optimization where your store the length of the payload, you actually have 
to do it in this order. Which means you have to encode the payload first and 
store the whole thing in memory before creating your message body.
* Double deserialization upon reads. You *might* need to read the headers so 
that you can figure out how to read the payload. It depends on how you use the 
container. In the JS/1 case, I had to read the ObjectIdInPayload field in order 
to deserialize the payload. However, in the AV/1 case, you did NOT have to read 
any of the headers in order to deserialize the payload.
* What if I want my header values to be complex types? What if I wanted to 
store a header where the value was an array?  Do I start relying on stuff like 
comma-separated strings to indicate arrays? What if I wanted to store a header 
where the value was binary bytes? Do I insist that headers all must be ASCII 
encoded? I realize this conflicts with what I said above about headers being 
easy to parse. Maybe they are actually more complex that I realized. 
* Size overhead of the container format and headers: If I have a 10 byte 
payload, but my container is 512 bytes of ascii-encoded strings, is it worth it?

Alternatives:
* I can imagine doing something similar to the above, but using Avro as the 
serialization format for the container. The avro schemas would be like the 
following (apologies if I got those wrong, I actually haven't used avro)

{
    "type": "record", 
    "name": "JS",
    "fields" : [
        {"name": "Host", "type" : "string"},
        {"name": "Service", "type" : "string"},
        {"name": "Timestamp", "type" : "double"},
        {"name": "ObjectTypeInPayload", "type" : "string"},
        {"name": "payload", "type": "bytes"}
    ]
}

{
    "type": "record", 
    "name": "AV",
    "fields" : [
        {"name": "Host", "type" : "string"},
        {"name": "Service", "type" : "string"},
        {"name": "Timestamp", "type" : "double"},
        {"name": "payload", "type": "bytes"}
    ]
}

You would use avro to deserialize the container, and then potentially use a 
different deserializer for the payload. Using avro would potentially reduce the 
overhead of the container format, and let you use complex types in your 
headers. However, this would mean people would still have to use avro for 
deserializing a Kafka message body.

Our experience using this at TiVo:
* We haven't run into any problems so far.
* We are not yet running Kafka in production, so we don't yet have a lot of 
traffic running through our brokers.
* Even when we go to production, we expect that the amount of data that we have 
will be relatively small compared to most companies. So we're hoping that the 
overhead of the container format will be okay for our use cases.

Phew, okay, that's enough for now. Let's discuss.

-James

> On Oct 27, 2016, at 12:19 AM, James Cheng <wushuja...@gmail.com> wrote:
> 
> 
>> On Oct 25, 2016, at 10:23 PM, Michael Pearce <michael.pea...@ig.com> wrote:
>> 
>> Hi All,
>> 
>> In case you hadn't noticed re the compaction issue for non-null values i 
>> have created a separate KIP-87, if you could all contribute to its 
>> discussion would be much appreciated.
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-87+-+Add+Compaction+Tombstone+Flag
>> 
>> Secondly, focussing back on KIP-82, one of the actions agreed from the KIP 
>> call was for some additional alternative solution proposals on top of those 
>> already detailed in the KIP wiki and subsequent linked wiki pages by others 
>> in the group in the meeting.
>> 
>> I haven't seen any activity on this, does this mean there isn't any further 
>> and everyone in hindsight actually thinks the current proposed solution in 
>> the KIP is the front runner? (i assume this isn't the case, just want to 
>> nudge everyone)
>> 
> 
> I have been meaning to respond, but I haven't had the time. In the next 
> couple days, I will try to write up the container format that TiVo is using, 
> and we can discuss it.
> 
> -James
> 
>> Also just copying across the kip call thread to keep everything in one 
>> thread to avoid a divergence of the discussion into multiple threads.
>> 
>> Cheers
>> Mike
>> 
>> ________________________________________
>> From: Mayuresh Gharat <gharatmayures...@gmail.com>
>> Sent: Monday, October 24, 2016 6:17 PM
>> To: dev@kafka.apache.org
>> Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST
>> 
>> I agree with Nacho.
>> +1 for the KIP.
>> 
>> Thanks,
>> 
>> Mayuresh
>> 
>> On Fri, Oct 21, 2016 at 11:46 AM, Nacho Solis <nso...@linkedin.com.invalid>
>> wrote:
>> 
>>> I think a separate KIP is a good idea as well.  Note however that potential
>>> decisions in this KIP could affect the other KIP.
>>> 
>>> Nacho
>>> 
>>> On Fri, Oct 21, 2016 at 10:23 AM, Jun Rao <j...@confluent.io> wrote:
>>> 
>>>> Michael,
>>>> 
>>>> Yes, doing a separate KIP to address the null payload issue for compacted
>>>> topics is a good idea.
>>>> 
>>>> Thanks,
>>>> 
>>>> Jun
>>>> 
>>>> On Fri, Oct 21, 2016 at 12:57 AM, Michael Pearce <michael.pea...@ig.com>
>>>> wrote:
>>>> 
>>>>> I had noted that what ever the solution having compaction based on null
>>>>> payload was agreed isn't elegant.
>>>>> 
>>>>> Shall we raise another kip to : as discussed propose using a attribute
>>>> bit
>>>>> for delete/compaction flag as well/or instead of null value and
>>> updating
>>>>> compaction logic to look at that delelete/compaction attribute
>>>>> 
>>>>> I believe this is less contentious, so that at least we get that done
>>>>> alleviating some concerns whilst the below gets discussed further?
>>>>> 
>>>>> ________________________________________
>>>>> From: Jun Rao <j...@confluent.io>
>>>>> Sent: Wednesday, October 19, 2016 8:56:52 PM
>>>>> To: dev@kafka.apache.org
>>>>> Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST
>>>>> 
>>>>> The following are the notes from today's KIP discussion.
>>>>> 
>>>>> 
>>>>>  - KIP-82 - add record header: We agreed that there are use cases for
>>>>>  third-party vendors building tools around Kafka. We haven't reached
>>>> the
>>>>>  conclusion whether the added complexity justifies the use cases. We
>>>> will
>>>>>  follow up on the mailing list with use cases, container format
>>> people
>>>>> have
>>>>>  been using, and details on the proposal.
>>>>> 
>>>>> 
>>>>> The video will be uploaded soon in https://cwiki.apache.org/
>>>>> confluence/display/KAFKA/Kafka+Improvement+Proposals .
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> On Mon, Oct 17, 2016 at 10:49 AM, Jun Rao <j...@confluent.io> wrote:
>>>>> 
>>>>>> Hi, Everyone.,
>>>>>> 
>>>>>> We plan to have a Kafka KIP meeting this coming Wednesday at 11:00am
>>>> PST.
>>>>>> If you plan to attend but haven't received an invite, please let me
>>>> know.
>>>>>> The following is the tentative agenda.
>>>>>> 
>>>>>> Agenda:
>>>>>> KIP-82: add record header
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Jun
>>>>>> 
>>>>> The information contained in this email is strictly confidential and
>>> for
>>>>> the use of the addressee only, unless otherwise indicated. If you are
>>> not
>>>>> the intended recipient, please do not read, copy, use or disclose to
>>>> others
>>>>> this message or any attachment. Please also notify the sender by
>>> replying
>>>>> to this email or by telephone (+44(020 7896 0011) and then delete the
>>>> email
>>>>> and any copies of it. Opinions, conclusion (etc) that do not relate to
>>>> the
>>>>> official business of this company shall be understood as neither given
>>>> nor
>>>>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>>>>> registered in England and Wales, company number 04008957) and IG Index
>>>>> Limited (a company registered in England and Wales, company number
>>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and
>>> IG
>>>>> Index Limited (register number 114059) are authorised and regulated by
>>>> the
>>>>> Financial Conduct Authority.
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Nacho (Ignacio) Solis
>>> Kafka
>>> nso...@linkedin.com
>>> 
>> 
>> 
>> 
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>> 
>> 
>> ________________________________________
>> From: Michael Pearce <michael.pea...@ig.com>
>> Sent: Monday, October 17, 2016 7:48 AM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>> 
>> Hi Jun,
>> 
>> Sounds good.
>> 
>> Look forward to the invite.
>> 
>> Cheers,
>> Mike
>> ________________________________________
>> From: Jun Rao <j...@confluent.io>
>> Sent: Monday, October 17, 2016 5:55:57 AM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>> 
>> Hi, Michael,
>> 
>> We do have online KIP discussion meetings from time to time. How about we
>> discuss this KIP Wed (Oct 19) at 11:00am PST? I will send out an invite (we
>> typically do the meeting through Zoom and will post the video recording to
>> Kafka wiki).
>> 
>> Thanks,
>> 
>> Jun
>> 
>> On Wed, Oct 12, 2016 at 1:22 AM, Michael Pearce <michael.pea...@ig.com>
>> wrote:
>> 
>>> @Jay and Dana
>>> 
>>> We have internally had a few discussions of how we may address this if we
>>> had a common apache kafka message wrapper for headers that can be used
>>> client side only to, and address the compaction issue.
>>> I have detailed this solution separately and linked from the main KIP-82
>>> wiki.
>>> 
>>> Here’s a direct link –
>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>> Headers+Value+Message+Wrapper
>>> 
>>> We feel this solution though doesn’t manage to address all the use cases
>>> being mentioned still and also has some compatibility drawbacks e.g.
>>> backwards forwards compatibility especially on different language clients
>>> Also we still require with this solution, as still need to address
>>> compaction issue / tombstones, we need to make server side changes and as
>>> many message/record version changes.
>>> 
>>> We believe the proposed solution in KIP-82 does address all these needs
>>> and is cleaner still, and more benefits.
>>> Please have a read, and comment. Also if you have any improvements on the
>>> proposed KIP-82 or an alternative solution/option your input is appreciated.
>>> 
>>> @All
>>> As Joel has mentioned to get this moving along, and able to discuss more
>>> fluidly, it would be great if we can organize to meet up virtually online
>>> e.g. webex or something.
>>> I am aware, that the majority are based in America, myself is in the UK.
>>> @Kostya I assume you’re in Eastern Europe or Russia based on your email
>>> address (please correct this assumption), I hope the time difference isn’t
>>> too much that the below would suit you if you wish to join
>>> 
>>> Can I propose next Wednesday 19th October at 18:30 BST , 10:30 PST, 20:30
>>> MSK we try meetup online?
>>> 
>>> Would this date/time suit the majority?
>>> Also what is the preferred method? I can host via Adobe Connect style
>>> webex (which my company uses) but it isn’t the best IMHO, so more than
>>> happy to have someone suggest a better alternative.
>>> 
>>> Best,
>>> Mike
>>> 
>>> 
>>> 
>>> 
>>> On 10/8/16, 7:26 AM, "Michael Pearce" <michael.pea...@ig.com> wrote:
>>> 
>>>>> I agree with the critique of compaction not having a value. I think
>>> we should consider fixing that directly.
>>> 
>>>> Agree that the compaction issue is troubling: compacted "null"
>>> deletes
>>>   are incompatible w/ headers that must be packed into the message
>>>   value. Are there any alternatives on compaction delete semantics that
>>>   could address this? The KIP wiki discussion I think mostly assumes
>>>   that compaction-delete is what it is and can't be changed/fixed.
>>> 
>>>   This KIP is about dealing with quite a few use cases and issues,
>>> please see both the KIP use cases detailed by myself and also the
>>> additional use cases wiki added by LinkedIn linked from the main KIP.
>>> 
>>>   The compaction is something that happily is addressed with headers,
>>> but most defiantly isn't the sole reason or use case for them, headers
>>> solves many issues and use cases. Thus their elegance and simplicity, and
>>> why they're so common in transport mechanisms and so succesfull, as stated
>>> like http, tcp, jms.
>>> 
>>>   ________________________________________
>>>   From: Dana Powers <dana.pow...@gmail.com>
>>>   Sent: Friday, October 7, 2016 11:09 PM
>>>   To: dev@kafka.apache.org
>>>   Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>> 
>>>> I agree with the critique of compaction not having a value. I think
>>> we should consider fixing that directly.
>>> 
>>>   Agree that the compaction issue is troubling: compacted "null" deletes
>>>   are incompatible w/ headers that must be packed into the message
>>>   value. Are there any alternatives on compaction delete semantics that
>>>   could address this? The KIP wiki discussion I think mostly assumes
>>>   that compaction-delete is what it is and can't be changed/fixed.
>>> 
>>>   -Dana
>>> 
>>>   On Fri, Oct 7, 2016 at 1:38 PM, Michael Pearce <michael.pea...@ig.com>
>>> wrote:
>>>> 
>>>> Hi Jay,
>>>> 
>>>> Thanks for the comments and feedback.
>>>> 
>>>> I think its quite clear that if a problem keeps arising then it is
>>> clear that it needs resolving, and addressing properly.
>>>> 
>>>> Fair enough at linkedIn, and historically for the very first use
>>> cases addressing this maybe not have been a big priority. But as Kafka is
>>> now Apache open source and being picked up by many including my company, it
>>> is clear and evident that this is a requirement and issue that needs to be
>>> now addressed to address these needs.
>>>> 
>>>> The fact in almost every transport mechanism including networking
>>> layers in the enterprise ive worked in, there has always been headers i
>>> think clearly shows their need and success for a transport mechanism.
>>>> 
>>>> I understand some concerns with regards to impact for others not
>>> needing it.
>>>> 
>>>> What we are proposing is flexible solution that provides no overhead
>>> on storage or network traffic layers if you chose not to use headers, but
>>> does enable those who need or want it to use it.
>>>> 
>>>> 
>>>> On your response to 1), there is nothing saying that it should be
>>> put in any faster or without diligence and the same KIP process can still
>>> apply for adding kafka-scope headers, having headers, just makes it easier
>>> to add, without constant message and record changes. Timestamp is a clear
>>> real example of actually what should be in a header (along with other
>>> fields) but as such the whole message/record object needed to be changed to
>>> add this, as will any further headers deemed needed by kafka.
>>>> 
>>>> On response to 2) why within my company as a platforms designer
>>> should i enforce that all teams use the same serialization for their
>>> payloads? But what i do need is some core cross cutting concerns and
>>> information addressed at my platform level and i don't want to impose onto
>>> my development teams. This is the same argument why byte[] is the exposed
>>> value and key because as a messaging platform you dont want to impose that
>>> on my company.
>>>> 
>>>> On response to 3) Actually this isnt true, there are many 3rd party
>>> tools, we need to hook into our messaging flows that they only build onto
>>> standardised interfaces as obviously the cost to have a custom
>>> implementation for every company would be very high.
>>>> APM tooling is a clear case in point, every enterprise level APM
>>> tool on the market is able to stitch in transaction flow end 2 end over a
>>> platform over http, jms because they can stitch in some "magic" data in a
>>> uniform/standardised for the two mentioned they stitch this into the
>>> headers. It is current form they cannot do this with Kafka. Providing a
>>> standardised interface will i believe actually benefit the project as
>>> commercial companies like these will now be able to plugin their tooling
>>> uniformly, making it attractive and possible.
>>>> 
>>>> Some of you other concerns as Joel mentions these are more
>>> implementation details, that i think should be agreed upon, but i think can
>>> be addressed.
>>>> 
>>>> e.g. re your concern on the hashmap.
>>>> it is more than possible not to have every record have to have a
>>> hashmap unless it actually has a header (just like we have managed to do on
>>> the serialized meesage) so if theres a concern on the in memory record size
>>> for those using kafka without headers.
>>>> 
>>>> On your second to last comment about every team choosing their own
>>> format, actually we do want this a little, as very first mentioned, no we
>>> don't want a free for all, but some freedom to have different serialization
>>> has different benefits and draw backs across our business. I can iterate
>>> these if needed. One of the use case for headers provided by linkedIn on
>>> top of my KIP even shows where headers could be beneficial here as a header
>>> could be used to detail which data format the message is serialized to
>>> allowing me to consume different formats.
>>>> 
>>>> Also we have some systems that we need to integrate that pretty near
>>> impossible to wrap or touch their binary payloads, or we’re not allowed to
>>> touch them (historic system, or inter/intra corporate)
>>>> 
>>>> Headers really gives as a solution to provide a pluggable platform,
>>> and standardisation that allows users to build platforms that adapt to
>>> their needs.
>>>> 
>>>> 
>>>> Cheers
>>>> Mike
>>>> 
>>>> 
>>>> ________________________________________
>>>> From: Jay Kreps <j...@confluent.io>
>>>> Sent: Friday, October 7, 2016 4:45 PM
>>>> To: dev@kafka.apache.org
>>>> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>>>> 
>>>> Hey guys,
>>>> 
>>>> This discussion has come up a number of times and we've always
>>> passed.
>>>> 
>>>> One of things that has helped keep Kafka simple is not adding in new
>>>> abstractions and concepts except when the proposal is really elegant
>>> and
>>>> makes things simpler.
>>>> 
>>>> Consider three use cases for headers:
>>>> 
>>>>  1. Kafka-scope: We want to add a feature to Kafka that needs a
>>>>  particular field.
>>>>  2. Company-scope: You want to add a header to be shared by
>>> everyone in
>>>>  your company.
>>>>  3. World-wide scope: You are building a third party tool and want
>>> to add
>>>>  some kind of header.
>>>> 
>>>> For the case of (1) you should not use headers, you should just add
>>> a field
>>>> to the record format. Having a second way of encoding things doesn't
>>> make
>>>> sense. Occasionally people have complained that adding to the record
>>> format
>>>> is hard and it would be nice to just shove lots of things in
>>> quickly. I
>>>> think a better solution would be to make it easy to add to the record
>>>> format, and I think we've made progress on that. I also think we
>>> should be
>>>> insanely focused on the simplicity of the abstraction and not adding
>>> in new
>>>> thingies often---we thought about time for years before adding a
>>> timestamp
>>>> and I guarantee you we would have goofed it up if we'd gone with the
>>>> earlier proposals. These things end up being long term commitments
>>> so it's
>>>> really worth being thoughtful.
>>>> 
>>>> For case (2) just use the body of the message. You don't need a
>>> globally
>>>> agreed on definition of headers, just standardize on a header you
>>> want to
>>>> include in the value in your company. Since this is just used by
>>> code in
>>>> your company having a more standard header format doesn't really
>>> help you.
>>>> In fact by using something like Avro you can define exactly the
>>> types you
>>>> want, the required header fields, etc.
>>>> 
>>>> The only case that headers help is (3). This is a bit of a niche
>>> case and i
>>>> think is easily solved just making the reading and writing of given
>>>> required fields pluggable to work with the header you have.
>>>> 
>>>> A couple of specific problems with this proposal:
>>>> 
>>>>  1. A global registry of numeric keys is super super ugly. This
>>> seems
>>>>  silly compared to the Avro (or whatever) header solution which
>>> gives more
>>>>  compact encoding, rich types, etc.
>>>>  2. Using byte arrays for header values means they aren't really
>>>>  interoperable for case (3). E.g. I can't make a UI that displays
>>> headers,
>>>>  or allow you to set them in config. To work with third party
>>> headers, the
>>>>  only case I think this really helps, you need the union of all
>>>>  serialization schemes people have used for any tool.
>>>>  3. For case (2) and (3) your key numbers are going to collide like
>>>>  crazy. I don't think a global registry of magic numbers
>>> maintained either
>>>>  by word of mouth or checking in changes to kafka source is the
>>> right thing
>>>>  to do.
>>>>  4. We are introducing a new serialization primitive which makes
>>> fields
>>>>  disappear conditional on the contents of other fields. This
>>> breaks the
>>>>  whole serialization/schema system we have today.
>>>>  5. We're adding a hashmap to each record
>>>>  6. This proposes making the ProducerRecord and ConsumerRecord
>>> mutable
>>>>  and adding setters and getters (which we try to avoid).
>>>> 
>>>> For context on LinkedIn: I set up the system there, but it may have
>>> changed
>>>> since i left. The header is maintained with the record schemas in
>>> the avro
>>>> schema registry and is required for all records. Essentially all
>>> messages
>>>> must have a field named "header" of type EventHeader which is itself
>>> a
>>>> record schema with a handful of fields (time, host, etc). The header
>>>> follows the same compatibility rules as other avro fields, so it can
>>> be
>>>> evolved in a compatible way gradually across apps. Avro is typed and
>>>> doesn't require deserializing the full record to read the header. The
>>>> header information is (timestamp, host, etc) is important and needs
>>> to
>>>> propagate into other systems like Hadoop which don't have a concept
>>> of
>>>> headers for records, so I doubt it could move out of the value in
>>> any case.
>>>> Not allowing teams to chose a data format other than avro was
>>> considered a
>>>> feature, not a bug, since the whole point was to be able to share
>>> data,
>>>> which doesn't work if every team chooses their own format.
>>>> 
>>>> I agree with the critique of compaction not having a value. I think
>>> we
>>>> should consider fixing that directly.
>>>> 
>>>> -Jay
>>>> 
>>>> On Thu, Sep 22, 2016 at 12:31 PM, Michael Pearce <
>>> michael.pea...@ig.com>
>>>> wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> 
>>>>> I would like to discuss the following KIP proposal:
>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 82+-+Add+Record+Headers
>>>>> 
>>>>> 
>>>>> 
>>>>> I have some initial ?drafts of roughly the changes that would be
>>> needed.
>>>>> This is no where finalized and look forward to the discussion
>>> especially as
>>>>> some bits I'm personally in two minds about.
>>>>> 
>>>>> https://github.com/michaelandrepearce/kafka/tree/
>>> kafka-headers-properties
>>>>> 
>>>>> 
>>>>> 
>>>>> Here is a link to a alternative option mentioned in the kip but one
>>> i
>>>>> would personally would discard (disadvantages mentioned in kip)
>>>>> 
>>>>> https://github.com/michaelandrepearce/kafka/tree/kafka-headers-full
>>> ?
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Mike
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> The information contained in this email is strictly confidential
>>> and for
>>>>> the use of the addressee only, unless otherwise indicated. If you
>>> are not
>>>>> the intended recipient, please do not read, copy, use or disclose
>>> to others
>>>>> this message or any attachment. Please also notify the sender by
>>> replying
>>>>> to this email or by telephone (+44(020 7896 0011) and then delete
>>> the email
>>>>> and any copies of it. Opinions, conclusion (etc) that do not relate
>>> to the
>>>>> official business of this company shall be understood as neither
>>> given nor
>>>>> endorsed by it. IG is a trading name of IG Markets Limited (a
>>> company
>>>>> registered in England and Wales, company number 04008957) and IG
>>> Index
>>>>> Limited (a company registered in England and Wales, company number
>>>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate
>>> Hill,
>>>>> London EC4R 2YA. Both IG Markets Limited (register number 195355)
>>> and IG
>>>>> Index Limited (register number 114059) are authorised and regulated
>>> by the
>>>>> Financial Conduct Authority.
>>>>> 
>>>> The information contained in this email is strictly confidential and
>>> for the use of the addressee only, unless otherwise indicated. If you are
>>> not the intended recipient, please do not read, copy, use or disclose to
>>> others this message or any attachment. Please also notify the sender by
>>> replying to this email or by telephone (+44(020 7896 0011) and then delete
>>> the email and any copies of it. Opinions, conclusion (etc) that do not
>>> relate to the official business of this company shall be understood as
>>> neither given nor endorsed by it. IG is a trading name of IG Markets
>>> Limited (a company registered in England and Wales, company number
>>> 04008957) and IG Index Limited (a company registered in England and Wales,
>>> company number 01190902). Registered address at Cannon Bridge House, 25
>>> Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number
>>> 195355) and IG Index Limited (register number 114059) are authorised and
>>> regulated by the Financial Conduct Authority.
>>> 
>>> 
>> The information contained in this email is strictly confidential and for the 
>> use of the addressee only, unless otherwise indicated. If you are not the 
>> intended recipient, please do not read, copy, use or disclose to others this 
>> message or any attachment. Please also notify the sender by replying to this 
>> email or by telephone (+44(020 7896 0011) and then delete the email and any 
>> copies of it. Opinions, conclusion (etc) that do not relate to the official 
>> business of this company shall be understood as neither given nor endorsed 
>> by it. IG is a trading name of IG Markets Limited (a company registered in 
>> England and Wales, company number 04008957) and IG Index Limited (a company 
>> registered in England and Wales, company number 01190902). Registered 
>> address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG 
>> Markets Limited (register number 195355) and IG Index Limited (register 
>> number 114059) are authorised and regulated by the Financial Conduct 
>> Authority.
>

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to