​Hey Roger.

The original design involved:
1- a header set per message (an array of key+values)
2- a message level API to set/get headers.
3- byte[] header-values
4- int header-keys
5- headers encoded at the protocol/core level


1- I think most (not all) people would agree that having metadata per
message is a good thing. Headers is one way to provide this.

2- There are many use cases for the headers. Quite number of them are at
the message level. Given this we expect the best way to do this is by
giving an API at the message level.  Agreement is not at 100% here on
providing an API to get/set headers available to all.  Some believe this
should be done purely by interceptors instead of direct API calls.  How
this "map" is presented to the user via the API can still being fine tuned.

3- byte[] header values allow the encoding of anything.  This is a black
box that does not need to be understood by anybody other than the
plugin/code that wrote the header to start with.  A plugin, if it so
wishes, could have a custom serializer.  So in here, if somebody wanted to
use protobuf or avro or what have you you could do that.

4- int header keys are in the proposal. This offers a very compact
representation with an easy ability to segment the space. Coordination is
needed in one way or another, whether ints are used or strings are used.
In our testing ints are faster than strings... is this performance boost
worth it?  We have differing opinions.  A lot of people would argue that
the flexibility of strings plus their ability to have long lengths make
coordination easier, and that compression will take care of the overhead.
I will make a quick note that HTTP2, which in theory uses strings as
headers uses static header compression, effectively using ints for the core
headers and a precomputed Huffman table for other strings. (
https://tools.ietf.org/html/rfc7541).

5- This is the big sticking point.  Should headers be done at the protocol
level (native) or as a container/wrapper inside the V part of the message.

Benefits of doing container:
- no modification to the broker
- no modification to the open source client.

Benefits of doing native:
- core can use headers (compaction, exactly-once, etc)
- broker can have plugins
- open source client can have plugins
- no need to worry about aliasing (interoperability between headers and no
header supporting clients)


There are a few other benefits that seem to come bundled into the native
implementation but could be made available in the container format.

For example, we could develop a shared open source client that offers a
container format. This would allow us to:
- have other open source projects depend on headers
- create a community to share plugins

This container format client could be completely separate from Apache Kafka
or it could be part of Apache Kafka. The people that would like to use
headers can use that client, and the people that think it's an overhead can
use the one without.


Nacho


On Mon, Nov 7, 2016 at 2:54 PM, Roger Hoover <roger.hoo...@gmail.com> wrote:

> Radai,
>
> If the broker must parse headers, then I agree that the serialization
> probably should not be configurable.  However, the if the broker sees
> metadata only as bytes and clients are the only components that serialize
> and deserialize the headers, then pluggability seems reasonable.
>
> Cheers,
>
> Roger
>
> On Sun, Nov 6, 2016 at 9:25 AM, radai <radai.rosenbl...@gmail.com> wrote:
>
> > making header _key_ serialization configurable potentially undermines the
> > board usefulness of the feature (any point along the path must be able to
> > read the header keys. the values may be whatever and require more
> intimate
> > knowledge of the code that produced specific headers, but keys should be
> > universally readable).
> >
> > it would also make it hard to write really portable plugins - say i
> wrote a
> > large message splitter/combiner - if i rely on key "largeMessage" and
> > values of the form "1/20" someone who uses (contrived example)
> Map<Byte[],
> > Double> wouldnt be able to re-use my code.
> >
> > not the end of a the world within an organization, but problematic if you
> > want to enable an ecosystem
> >
> > On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <roger.hoo...@gmail.com>
> > wrote:
> >
> > >  As others have laid out, I see strong reasons for a common message
> > > metadata structure for the Kafka ecosystem.  In particular, I've seen
> > that
> > > even within a single organization, infrastructure teams often own the
> > > message metadata while application teams own the application-level data
> > > format.  Allowing metadata and content to have different structure and
> > > evolve separately is very helpful for this.  Also, I think there's a
> lot
> > of
> > > value to having a common metadata structure shared across the Kafka
> > > ecosystem so that tools which leverage metadata can more easily be
> shared
> > > across organizations and integrated together.
> > >
> > > The question is, where does the metadata structure belong?  Here's my
> > take:
> > >
> > > We change the Kafka wire and on-disk format to from a (key, value)
> model
> > to
> > > a (key, metadata, value) model where all three are byte arrays from the
> > > brokers point of view.  The primary reason for this is that it
> provides a
> > > backward compatible migration path forward.  Producers can start
> > populating
> > > metadata fields before all consumers understand the metadata structure.
> > > For people who already have custom envelope structures, they can
> populate
> > > their existing structure and the new structure for a while as they make
> > the
> > > transition.
> > >
> > > We could stop there and let the clients plug in a KeySerializer,
> > > MetadataSerializer, and ValueSerializer but I think it is also be
> useful
> > to
> > > have a default MetadataSerializer that implements a key-value model
> > similar
> > > to AMQP or HTTP headers.  Or we could go even further and prescribe a
> > > Map<String, byte[]> or Map<String, String> data model for headers in
> the
> > > clients (while still allowing custom serialization of the header data
> > > model).
> > >
> > > I think this would address Radai's concerns:
> > > 1. All client code would not need to be updated to know about the
> > > container.
> > > 2. Middleware friendly clients would have a standard header data model
> to
> > > work with.
> > > 3. KIP is required both b/c of broker changes and because of client API
> > > changes.
> > >
> > > Cheers,
> > >
> > > Roger
> > >
> > >
> > > On Wed, Nov 2, 2016 at 4:38 PM, radai <radai.rosenbl...@gmail.com>
> > wrote:
> > >
> > > > my biggest issues with a "standard" wrapper format:
> > > >
> > > > 1. _ALL_ client _CODE_ (as opposed to kafka lib version) must be
> > updated
> > > to
> > > > know about the container, because any old naive code trying to
> directly
> > > > deserialize its own payload would keel over and die (it needs to know
> > to
> > > > deserialize a container, and then dig in there for its payload).
> > > > 2. in order to write middleware-friendly clients that utilize such a
> > > > container one would basically have to write their own
> producer/consumer
> > > API
> > > > on top of the open source kafka one.
> > > > 3. if you were going to go with a wrapper format you really dont need
> > to
> > > > bother with a kip (just open source your own client stack from #2
> above
> > > so
> > > > others could stop re-inventing it)
> > > >
> > > > On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <wushuja...@gmail.com>
> > > wrote:
> > > >
> > > > > How exactly would this work? Or maybe that's out of scope for this
> > > email.
> > > >
> > >
> >
>



-- 
Nacho (Ignacio) Solis
Kafka
nso...@linkedin.com

Reply via email to