Re: [DISCUSS] KIP-82 - Add Record Headers

Todd Palino Thu, 01 Dec 2016 17:26:30 -0800

Got it. As an ops guy, I'm not very happy with the workaround. Avro means
that I have to be concerned with the format of the messages in order to run
the infrastructure (audit, mirroring, etc.). That means that I have to
handle the schemas, and I have to enforce rules about good formats. This is
not something I want to be in the business of, because I should be able to
run a service infrastructure without needing to be in the weeds of dealing
with customer data formats.


Trust me, a sizable portion of my support time is spent dealing with schema
issues. I really would like to get away from that. Maybe I'd have more time
for other hobbies. Like writing. ;)

-Todd

On Thu, Dec 1, 2016 at 4:04 PM Gwen Shapira <g...@confluent.io> wrote:

> I'm pretty satisfied with the current workarounds (Avro container
> format), so I'm not too excited about the extra work required to do
> headers in Kafka. I absolutely don't mind it if you do it...
> I think the Apache convention for "good idea, but not willing to put
> any work toward it" is +0.5? anyway, that's what I was trying to
> convey :)
>
> On Thu, Dec 1, 2016 at 3:05 PM, Todd Palino <tpal...@gmail.com> wrote:
> > Well I guess my question for you, then, is what is holding you back from
> > full support for headers? What’s the bit that you’re missing that has you
> > under a full +1?
> >
> > -Todd
> >
> >
> > On Thu, Dec 1, 2016 at 1:59 PM, Gwen Shapira <g...@confluent.io> wrote:
> >
> >> I know why people who support headers support them, and I've seen what
> >> the discussion is like.
> >>
> >> This is why I'm asking people who are against headers (especially
> >> committers) what will make them change their mind - so we can get this
> >> part over one way or another.
> >>
> >> If I sound frustrated it is not at Radai, Jun or you (Todd)... I am
> >> just looking for something concrete we can do to move the discussion
> >> along to the yummy design details (which is the argument I really am
> >> looking forward to).
> >>
> >> On Thu, Dec 1, 2016 at 1:53 PM, Todd Palino <tpal...@gmail.com> wrote:
> >> > So, Gwen, to your question (even though I’m not a committer)...
> >> >
> >> > I have always been a strong supporter of introducing the concept of an
> >> > envelope to messages, which headers accomplishes. The message key is
> >> > already an example of a piece of envelope information. By providing a
> >> means
> >> > to do this within Kafka itself, and not relying on use-case specific
> >> > implementations, you make it much easier for components to
> interoperate.
> >> It
> >> > simplifies development of all these things (message routing, auditing,
> >> > encryption, etc.) because each one does not have to reinvent the
> wheel.
> >> >
> >> > It also makes it much easier from a client point of view if the
> headers
> >> are
> >> > defined as part of the protocol and/or message format in general
> because
> >> > you can easily produce and consume messages without having to take
> into
> >> > account specific cases. For example, I want to route messages, but
> >> client A
> >> > doesn’t support the way audit implemented headers, and client B
> doesn’t
> >> > support the way encryption or routing implemented headers, so now my
> >> > application has to create some really fragile (my autocorrect just
> tried
> >> to
> >> > make that “tragic”, which is probably appropriate too) code to strip
> >> > everything off, rather than just consuming the messages, picking out
> the
> >> 1
> >> > or 2 headers it’s interested in, and performing its function.
> >> >
> >> > Honestly, this discussion has been going on for a long time, and it’s
> >> > always “Oh, you came up with 2 use cases, and yeah, those use cases
> are
> >> > real things that someone would want to do. Here’s an alternate way to
> >> > implement them so let’s not do headers.” If we have a few use cases
> that
> >> we
> >> > actually came up with, you can be sure that over the next year
> there’s a
> >> > dozen others that we didn’t think of that someone would like to do. I
> >> > really think it’s time to stop rehashing this discussion and instead
> >> focus
> >> > on a workable standard that we can adopt.
> >> >
> >> > -Todd
> >> >
> >> >
> >> > On Thu, Dec 1, 2016 at 1:39 PM, Todd Palino <tpal...@gmail.com>
> wrote:
> >> >
> >> >> C. per message encryption
> >> >>> One drawback of this approach is that this significantly reduce the
> >> >>> effectiveness of compression, which happens on a set of serialized
> >> >>> messages. An alternative is to enable SSL for wire encryption and
> rely
> >> on
> >> >>> the storage system (e.g. LUKS) for at rest encryption.
> >> >>
> >> >>
> >> >> Jun, this is not sufficient. While this does cover the case of
> removing
> >> a
> >> >> drive from the system, it will not satisfy most compliance
> requirements
> >> for
> >> >> encryption of data as whoever has access to the broker itself still
> has
> >> >> access to the unencrypted data. For end-to-end encryption you need to
> >> >> encrypt at the producer, before it enters the system, and decrypt at
> the
> >> >> consumer, after it exits the system.
> >> >>
> >> >> -Todd
> >> >>
> >> >>
> >> >> On Thu, Dec 1, 2016 at 1:03 PM, radai <radai.rosenbl...@gmail.com>
> >> wrote:
> >> >>
> >> >>> another big plus of headers in the protocol is that it would enable
> >> rapid
> >> >>> iteration on ideas outside of core kafka and would reduce the
> number of
> >> >>> future wire format changes required.
> >> >>>
> >> >>> a lot of what is currently a KIP represents use cases that are not
> 100%
> >> >>> relevant to all users, and some of them require rather invasive wire
> >> >>> protocol changes. a thing a good recent example of this is kip-98.
> >> >>> tx-utilizing traffic is expected to be a very small fraction of
> total
> >> >>> traffic and yet the changes are invasive.
> >> >>>
> >> >>> every such wire format change translates into painful and slow
> >> adoption of
> >> >>> new versions.
> >> >>>
> >> >>> i think a lot of functionality currently in KIPs could be "spun out"
> >> and
> >> >>> implemented as opt-in plugins transmitting data over headers. this
> >> would
> >> >>> keep the core wire format stable(r), core codebase smaller, and
> avoid
> >> the
> >> >>> "burden of proof" thats sometimes required to prove a certain
> feature
> >> is
> >> >>> useful enough for a wide-enough audience to warrant a wire format
> >> change
> >> >>> and code complexity additions.
> >> >>>
> >> >>> (to be clear - kip-98 goes beyond "mere" wire format changes and im
> not
> >> >>> saying it could have been completely done with headers, but
> >> exactly-once
> >> >>> delivery certainly could)
> >> >>>
> >> >>> On Thu, Dec 1, 2016 at 11:20 AM, Gwen Shapira <g...@confluent.io>
> >> wrote:
> >> >>>
> >> >>> > On Thu, Dec 1, 2016 at 10:24 AM, radai <
> radai.rosenbl...@gmail.com>
> >> >>> wrote:
> >> >>> > > "For use cases within an organization, one could always use
> other
> >> >>> > > approaches such as company-wise containers"
> >> >>> > > this is what linkedin has traditionally done but there are now
> >> cases
> >> >>> > (read
> >> >>> > > - topics) where this is not acceptable. this makes headers
> useful
> >> even
> >> >>> > > within single orgs for cases where one-container-fits-all cannot
> >> >>> apply.
> >> >>> > >
> >> >>> > > as for the particular use cases listed, i dont want this to
> devolve
> >> >>> to a
> >> >>> > > discussion of particular use cases - i think its enough that
> some
> >> of
> >> >>> them
> >> >>> >
> >> >>> > I think a main point of contention is that: We identified few
> >> >>> > use-cases where headers are useful, do we want Kafka to be a
> system
> >> >>> > that supports those use-cases?
> >> >>> >
> >> >>> > For example, Jun said:
> >> >>> > "Not sure how widely useful record-level lineage is though since
> the
> >> >>> > overhead could
> >> >>> > be significant."
> >> >>> >
> >> >>> > We know NiFi supports record level lineage. I don't think it was
> >> >>> > developed for lols, I think it is safe to assume that the NSA
> needed
> >> >>> > that functionality. We also know that certain financial institutes
> >> >>> > need to track tampering with records at a record level and there
> are
> >> >>> > federal regulations that absolutely require this.  They also need
> to
> >> >>> > prove that routing apps that "touches" the messages and either
> reads
> >> >>> > or updates headers couldn't have possibly modified the payload
> >> itself.
> >> >>> > They use record level encryption to do that - apps can read and
> >> >>> > (sometimes) modify headers but can't touch the payload.
> >> >>> >
> >> >>> > We can totally say "those are corner cases and not worth adding
> >> >>> > headers to Kafka for", they should use a different pubsub message
> for
> >> >>> > that (Nifi or one of the other 1000 that cater specifically to the
> >> >>> > financial industry).
> >> >>> >
> >> >>> > But this gets us into a catch 22:
> >> >>> > If we discuss a specific use-case, someone can always say it isn't
> >> >>> > interesting enough for Kafka. If we discuss more general trends,
> >> >>> > others can say "well, we are not sure any of them really needs
> >> headers
> >> >>> > specifically. This is just hand waving and not interesting.".
> >> >>> >
> >> >>> > I think discussing use-cases in specifics is super important to
> >> decide
> >> >>> > implementation details for headers (my use-cases lean toward
> >> numerical
> >> >>> > keys with namespaces and object values, others differ), but I
> think
> >> we
> >> >>> > need to answer the general "Are we going to have headers" question
> >> >>> > first.
> >> >>> >
> >> >>> > I'd love to hear from the other committers in the discussion:
> >> >>> > What would it take to convince you that headers in Kafka are a
> good
> >> >>> > idea in general, so we can move ahead and try to agree on the
> >> details?
> >> >>> >
> >> >>> > I feel like we keep moving the goal posts and this is truly
> >> exhausting.
> >> >>> >
> >> >>> > For the record, I mildly support adding headers to Kafka (+0.5?).
> >> >>> > The community can continue to find workarounds to the issue and
> there
> >> >>> > are some benefits to keeping the message format and clients
> simpler.
> >> >>> > But I see the usefulness of headers to many use-cases and if we
> can
> >> >>> > find a good and generally useful way to add it to Kafka, it will
> make
> >> >>> > Kafka easier to use for many - worthy goal in my eyes.
> >> >>> >
> >> >>> > > are interesting/feasible, but:
> >> >>> > > A+B. i think there are use cases for polyglot topics.
> especially if
> >> >>> kafka
> >> >>> > > is being used to "trunk" something else.
> >> >>> > > D. multiple topics would make it harder to write portable
> consumer
> >> >>> code.
> >> >>> > > partition remapping would mess with locality of consumption
> >> >>> guarantees.
> >> >>> > > E+F. a use case I see for lineage/metadata is
> billing/chargeback.
> >> for
> >> >>> > that
> >> >>> > > use case it is not enough to simply record the point of origin,
> but
> >> >>> every
> >> >>> > > replication stop (think mirror maker) must also add a record to
> >> form a
> >> >>> > > "transit log".
> >> >>> > >
> >> >>> > > as for stream processing on top of kafka - i know samza has a
> >> metadata
> >> >>> > map
> >> >>> > > which they carry around in addition to user values. headers are
> the
> >> >>> > perfect
> >> >>> > > fit for these things.
> >> >>> > >
> >> >>> > >
> >> >>> > >
> >> >>> > > On Wed, Nov 30, 2016 at 6:50 PM, Jun Rao <j...@confluent.io>
> wrote:
> >> >>> > >
> >> >>> > >> Hi, Michael,
> >> >>> > >>
> >> >>> > >> In order to answer the first two questions, it would be helpful
> >> if we
> >> >>> > could
> >> >>> > >> identify 1 or 2 strong use cases for headers in the space for
> >> >>> > third-party
> >> >>> > >> vendors. For use cases within an organization, one could always
> >> use
> >> >>> > other
> >> >>> > >> approaches such as company-wise containers to get around w/o
> >> >>> headers. I
> >> >>> > >> went through the use cases in the KIP and in Radai's wiki (
> >> >>> > >> https://cwiki.apache.org/confluence/display/KAFKA/A+
> >> >>> > Case+for+Kafka+Headers
> >> >>> > >> ).
> >> >>> > >> The following are the ones that that I understand and could be
> in
> >> the
> >> >>> > >> third-party use case category.
> >> >>> > >>
> >> >>> > >> A. content-type
> >> >>> > >> It seems that in general, content-type should be set at the
> topic
> >> >>> level.
> >> >>> > >> Not sure if mixing messages with different content types
> should be
> >> >>> > >> encouraged.
> >> >>> > >>
> >> >>> > >> B. schema id
> >> >>> > >> Since the value is mostly useless without schema id, it seems
> that
> >> >>> > storing
> >> >>> > >> the schema id together with serialized bytes in the value is
> >> better?
> >> >>> > >>
> >> >>> > >> C. per message encryption
> >> >>> > >> One drawback of this approach is that this significantly reduce
> >> the
> >> >>> > >> effectiveness of compression, which happens on a set of
> serialized
> >> >>> > >> messages. An alternative is to enable SSL for wire encryption
> and
> >> >>> rely
> >> >>> > on
> >> >>> > >> the storage system (e.g. LUKS) for at rest encryption.
> >> >>> > >>
> >> >>> > >> D. cluster ID for mirroring across Kafka clusters
> >> >>> > >> This is actually interesting. Today, to avoid introducing
> cycles
> >> when
> >> >>> > doing
> >> >>> > >> mirroring across data centers, one would either have to set up
> two
> >> >>> Kafka
> >> >>> > >> clusters (a local and an aggregate) per data center or rename
> >> topics.
> >> >>> > >> Neither is ideal. With headers, the producer could tag each
> >> message
> >> >>> with
> >> >>> > >> the producing cluster ID in the header. MirrorMaker could then
> >> avoid
> >> >>> > >> mirroring messages to a cluster if they are tagged with the
> same
> >> >>> cluster
> >> >>> > >> id.
> >> >>> > >>
> >> >>> > >> However, an alternative approach is to introduce sth like
> >> >>> hierarchical
> >> >>> > >> topic and store messages from different clusters in different
> >> >>> partitions
> >> >>> > >> under the same topic. This approach avoids filtering out
> unneeded
> >> >>> data
> >> >>> > and
> >> >>> > >> makes offset preserving easier to support. It may make
> compaction
> >> >>> > trickier
> >> >>> > >> though since the same key may show up in different partitions.
> >> >>> > >>
> >> >>> > >> E. record-level lineage
> >> >>> > >> For example, a source connector could store in the message the
> >> >>> metadata
> >> >>> > >> (e.g. UUID) of the source record. Similarly, if a stream job
> >> >>> transforms
> >> >>> > >> messages from topic A to topic B, the library could include the
> >> >>> source
> >> >>> > >> message offset in each of the transformed message in the
> header.
> >> Not
> >> >>> > sure
> >> >>> > >> how widely useful record-level lineage is though since the
> >> overhead
> >> >>> > could
> >> >>> > >> be significant.
> >> >>> > >>
> >> >>> > >> F. auditing metadata
> >> >>> > >> We could put things like clientId/host/user in the header in
> each
> >> >>> > message
> >> >>> > >> for auditing. These metadata are really at the producer level
> >> though.
> >> >>> > So, a
> >> >>> > >> more efficient way is to only include a "producerId" per
> message
> >> and
> >> >>> > send
> >> >>> > >> the producerId -> metadata mapping independently. KIP-98 is
> >> actually
> >> >>> > >> proposing including such a producerId natively in the message.
> >> >>> > >>
> >> >>> > >> So, overall, I not sure that I am fully convinced of the strong
> >> >>> > third-party
> >> >>> > >> use cases of headers yet. Perhaps we could discuss a bit more
> to
> >> make
> >> >>> > one
> >> >>> > >> or two really convincing use cases.
> >> >>> > >>
> >> >>> > >> Another orthogonal  question is whether header should be
> exposed
> >> in
> >> >>> > stream
> >> >>> > >> processing systems such Kafka stream, Samza, and Spark
> streaming.
> >> >>> > >> Currently, those systems just deal with key/value pairs.
> Should we
> >> >>> > expose a
> >> >>> > >> third thing header there too or somehow map header to key or
> >> value?
> >> >>> > >>
> >> >>> > >> Thanks,
> >> >>> > >>
> >> >>> > >> Jun
> >> >>> > >>
> >> >>> > >>
> >> >>> > >> On Tue, Nov 29, 2016 at 3:35 AM, Michael Pearce <
> >> >>> michael.pea...@ig.com>
> >> >>> > >> wrote:
> >> >>> > >>
> >> >>> > >> > I assume, that after a period of a week, that there is no
> >> concerns
> >> >>> now
> >> >>> > >> > with points 1, and 2 and now we have agreement that headers
> are
> >> >>> useful
> >> >>> > >> and
> >> >>> > >> > needed in Kafka. As such if put to a KIP vote, this wouldn’t
> be
> >> a
> >> >>> > reason
> >> >>> > >> to
> >> >>> > >> > reject.
> >> >>> > >> >
> >> >>> > >> > @
> >> >>> > >> > Ignacio on point 4).
> >> >>> > >> > I think for purpose of getting this KIP moving past this, we
> can
> >> >>> state
> >> >>> > >> the
> >> >>> > >> > key will be a 4 bytes space that can will be naturally
> >> interpreted
> >> >>> as
> >> >>> > an
> >> >>> > >> > Int32 (if namespacing is later wanted you can easily split
> this
> >> >>> into
> >> >>> > two
> >> >>> > >> > int16 spaces), from the wire protocol implementation this
> makes
> >> no
> >> >>> > >> > difference I don’t believe. Is this reasonable to all?
> >> >>> > >> >
> >> >>> > >> > On 5) as per point 4 therefor happy we keep with 32 bits.
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >> > On 18/11/2016, 20:34, "ignacio.so...@gmail.com on behalf of
> >> >>> Ignacio
> >> >>> > >> > Solis" <ignacio.so...@gmail.com on behalf of iso...@igso.net
> >
> >> >>> wrote:
> >> >>> > >> >
> >> >>> > >> >     Summary:
> >> >>> > >> >
> >> >>> > >> >     3) Yes - Header value as byte[]
> >> >>> > >> >
> >> >>> > >> >     4a) Int,Int - No
> >> >>> > >> >     4b) Int - Yes
> >> >>> > >> >     4c) String - Reluctant maybe
> >> >>> > >> >
> >> >>> > >> >     5) I believe the header system should take a single
> int.  I
> >> >>> think
> >> >>> > >> > 32bits is
> >> >>> > >> >     a good size, if you want to interpret this as to 16bit
> >> numbers
> >> >>> in
> >> >>> > the
> >> >>> > >> > layer
> >> >>> > >> >     above go right ahead.  If somebody wants to argue for 16
> >> bits
> >> >>> or
> >> >>> > 64
> >> >>> > >> > bits of
> >> >>> > >> >     header key space I would listen.
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >> >     Discussion:
> >> >>> > >> >     Dividing the key space into sub_key_1 and sub_key_2
> makes no
> >> >>> > sense to
> >> >>> > >> > me at
> >> >>> > >> >     this layer.  Are we going to start providing APIs to get
> all
> >> >>> the
> >> >>> > >> >     sub_key_1s? or all the sub_key_2s?  If there is no
> >> >>> distinguishing
> >> >>> > >> > functions
> >> >>> > >> >     that are applied to each one then they should be a single
> >> >>> value.
> >> >>> > At
> >> >>> > >> > this
> >> >>> > >> >     layer all we're doing is equality.
> >> >>> > >> >     If the above layer wants to interpret this as 2, 3 or
> more
> >> >>> values
> >> >>> > >> > that's a
> >> >>> > >> >     different question.  I personally think it's all one
> >> keyspace
> >> >>> > that is
> >> >>> > >> >     getting assigned using some structure, but if you want to
> >> >>> > sub-assign
> >> >>> > >> > parts
> >> >>> > >> >     of it then that's fine.
> >> >>> > >> >
> >> >>> > >> >     The same discussion applies to strings.  If somebody
> argued
> >> for
> >> >>> > >> > strings,
> >> >>> > >> >     would we be arguing to divide the strings with dots ('.')
> >> as a
> >> >>> > >> > requirement?
> >> >>> > >> >     Would we want them to give us the different name segments
> >> >>> > separately?
> >> >>> > >> >     Would we be performing any actions on this key other than
> >> >>> > matching?
> >> >>> > >> >
> >> >>> > >> >     Nacho
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >> >     On Fri, Nov 18, 2016 at 9:30 AM, Michael Pearce <
> >> >>> > >> michael.pea...@ig.com
> >> >>> > >> > >
> >> >>> > >> >     wrote:
> >> >>> > >> >
> >> >>> > >> >     > #jay #jun any concerns on 1 and 2 still?
> >> >>> > >> >     >
> >> >>> > >> >     > @all
> >> >>> > >> >     > To get this moving along a bit more I'd also like to
> ask
> >> to
> >> >>> get
> >> >>> > >> > clarity on
> >> >>> > >> >     > the below last points:
> >> >>> > >> >     >
> >> >>> > >> >     > 3) I believe we're all roughly happy with the header
> value
> >> >>> > being a
> >> >>> > >> > byte[]?
> >> >>> > >> >     >
> >> >>> > >> >     > 4) I believe consensus has been for an namespace based
> int
> >> >>> > approach
> >> >>> > >> >     > {int,int} for the key. Any objections if this is what
> we
> >> go
> >> >>> > with?
> >> >>> > >> >     >
> >> >>> > >> >     > 5) as we have if assumption in (4)  is correct,
> {int,int}
> >> >>> keys.
> >> >>> > >> >     > Should both int's be int16 or int32?
> >> >>> > >> >     > I'm for them being int16(2 bytes) as combined is space
> of
> >> >>> > 4bytes as
> >> >>> > >> > per
> >> >>> > >> >     > original and gives plenty of combinations for the
> >> >>> foreseeable,
> >> >>> > and
> >> >>> > >> > keeps
> >> >>> > >> >     > the overhead small.
> >> >>> > >> >     >
> >> >>> > >> >     > Do we see any benefit in another kip call to discuss
> >> these at
> >> >>> > all?
> >> >>> > >> >     >
> >> >>> > >> >     > Cheers
> >> >>> > >> >     > Mike
> >> >>> > >> >     > ________________________________________
> >> >>> > >> >     > From: K Burstev <k.burs...@yandex.com>
> >> >>> > >> >     > Sent: Friday, November 18, 2016 7:07:07 AM
> >> >>> > >> >     > To: dev@kafka.apache.org
> >> >>> > >> >     > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> >> >>> > >> >     >
> >> >>> > >> >     > For what it is worth also i agree. As a user:
> >> >>> > >> >     >
> >> >>> > >> >     >  1) Yes - Headers are worthwhile
> >> >>> > >> >     >  2) Yes - Headers should be a top level option
> >> >>> > >> >     >
> >> >>> > >> >     > 14.11.2016, 21:15, "Ignacio Solis" <iso...@igso.net>:
> >> >>> > >> >     > > 1) Yes - Headers are worthwhile
> >> >>> > >> >     > > 2) Yes - Headers should be a top level option
> >> >>> > >> >     > >
> >> >>> > >> >     > > On Mon, Nov 14, 2016 at 9:16 AM, Michael Pearce <
> >> >>> > >> > michael.pea...@ig.com>
> >> >>> > >> >     > > wrote:
> >> >>> > >> >     > >
> >> >>> > >> >     > >>  Hi Roger,
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  The kip details/examples the original proposal for
> key
> >> >>> > spacing
> >> >>> > >> ,
> >> >>> > >> > not
> >> >>> > >> >     > the
> >> >>> > >> >     > >>  new mentioned as per discussion namespace idea.
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  We will need to update the kip, when we get
> agreement
> >> >>> this
> >> >>> > is a
> >> >>> > >> > better
> >> >>> > >> >     > >>  approach (which seems to be the case if I have
> >> understood
> >> >>> > the
> >> >>> > >> > general
> >> >>> > >> >     > >>  feeling in the conversation)
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  Re the variable ints, at very early stage we did
> think
> >> >>> about
> >> >>> > >> > this. I
> >> >>> > >> >     > think
> >> >>> > >> >     > >>  the added complexity for the saving isn't worth it.
> >> I'd
> >> >>> > rather
> >> >>> > >> go
> >> >>> > >> >     > with, if
> >> >>> > >> >     > >>  we want to reduce overheads and size int16 (2bytes)
> >> keys
> >> >>> as
> >> >>> > it
> >> >>> > >> > keeps it
> >> >>> > >> >     > >>  simple.
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  On the note of no headers, there is as per the kip
> as
> >> we
> >> >>> > use an
> >> >>> > >> >     > attribute
> >> >>> > >> >     > >>  bit to denote if headers are present or not as such
> >> >>> > provides a
> >> >>> > >> > zero
> >> >>> > >> >     > >>  overhead currently if headers are not used.
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  I think as radai mentions would be good first if we
> >> can
> >> >>> get
> >> >>> > >> > clarity if
> >> >>> > >> >     > do
> >> >>> > >> >     > >>  we now have general consensus that (1) headers are
> >> >>> > worthwhile
> >> >>> > >> and
> >> >>> > >> >     > useful,
> >> >>> > >> >     > >>  and (2) we want it as a top level entity.
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  Just to state the obvious i believe (1) headers are
> >> >>> > worthwhile
> >> >>> > >> > and (2)
> >> >>> > >> >     > >>  agree as a top level entity.
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  Cheers
> >> >>> > >> >     > >>  Mike
> >> >>> > >> >     > >>  ________________________________________
> >> >>> > >> >     > >>  From: Roger Hoover <roger.hoo...@gmail.com>
> >> >>> > >> >     > >>  Sent: Wednesday, November 9, 2016 9:10:47 PM
> >> >>> > >> >     > >>  To: dev@kafka.apache.org
> >> >>> > >> >     > >>  Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  Sorry for going a little in the weeds but thanks
> for
> >> the
> >> >>> > >> replies
> >> >>> > >> >     > regarding
> >> >>> > >> >     > >>  varint.
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  Agreed that a prefix and {int, int} can be the
> same.
> >> It
> >> >>> > doesn't
> >> >>> > >> > look
> >> >>> > >> >     > like
> >> >>> > >> >     > >>  that's what the KIP is saying the "Open" section.
> The
> >> >>> > example
> >> >>> > >> > shows
> >> >>> > >> >     > >>  2100001
> >> >>> > >> >     > >>  for New Relic and 210002 for App Dynamics implying
> >> that
> >> >>> the
> >> >>> > New
> >> >>> > >> > Relic
> >> >>> > >> >     > >>  organization will have only a single header id to
> work
> >> >>> > with. Or
> >> >>> > >> > is
> >> >>> > >> >     > 2100001
> >> >>> > >> >     > >>  a prefix? The main point of a namespace or prefix
> is
> >> to
> >> >>> > reduce
> >> >>> > >> > the
> >> >>> > >> >     > >>  overhead of config mapping or registration
> depending
> >> on
> >> >>> how
> >> >>> > >> >     > >>  namespaces/prefixes are managed.
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  Would love to hear more feedback on the
> higher-level
> >> >>> > questions
> >> >>> > >> >     > though...
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  Cheers,
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  Roger
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  On Wed, Nov 9, 2016 at 11:38 AM, radai <
> >> >>> > >> > radai.rosenbl...@gmail.com>
> >> >>> > >> >     > wrote:
> >> >>> > >> >     > >>
> >> >>> > >> >     > >>  > I think this discussion is getting a bit into the
> >> >>> weeds on
> >> >>> > >> > technical
> >> >>> > >> >     > >>  > implementation details.
> >> >>> > >> >     > >>  > I'd liek to step back a minute and try and
> establish
> >> >>> > where we
> >> >>> > >> > are in
> >> >>> > >> >     > the
> >> >>> > >> >     > >>  > larger picture:
> >> >>> > >> >     > >>  >
> >> >>> > >> >     > >>  > (re-wording nacho's last paragraph)
> >> >>> > >> >     > >>  > 1. are we all in agreement that headers are a
> >> >>> worthwhile
> >> >>> > and
> >> >>> > >> > useful
> >> >>> > >> >     > >>  > addition to have? this was contested early on
> >> >>> > >> >     > >>  > 2. are we all in agreement on headers as top
> level
> >> >>> entity
> >> >>> > vs
> >> >>> > >> > headers
> >> >>> > >> >     > >>  > squirreled-away in V?
> >> >>> > >> >     > >>  >
> >> >>> > >> >     > >>  > if there are still concerns around these #2
> points
> >> >>> (#jay?
> >> >>> > >> > #jun?)?
> >> >>> > >> >     > >>  >
> >> >>> > >> >     > >>  > (and now back to our normal programming ...)
> >> >>> > >> >     > >>  >
> >> >>> > >> >     > >>  > varints are nice. having said that, its adding
> >> >>> complexity
> >> >>> > >> (see
> >> >>> > >> >     > >>  > https://github.com/addthis/
> >> stream-lib/blob/master/src/
> >> >>> > >> >     > >>  > main/java/com/clearspring/
> >> analytics/util/Varint.java
> >> >>> > >> >     > >>  > as 1st google result) and would require anyone
> >> writing
> >> >>> > other
> >> >>> > >> > clients
> >> >>> > >> >     > (C?
> >> >>> > >> >     > >>  > Python? Go? Bash? ;-) ) to get/implement the
> same,
> >> and
> >> >>> for
> >> >>> > >> > relatively
> >> >>> > >> >     > >>  > little gain (int vs string is order of magnitude,
> >> this
> >> >>> > isnt).
> >> >>> > >> >     > >>  >
> >> >>> > >> >     > >>  > int namespacing vs {int, int} namespacing are
> >> basically
> >> >>> > the
> >> >>> > >> > same
> >> >>> > >> >     > thing -
> >> >>> > >> >     > >>  > youre just namespacing an int64 and giving people
> >> while
> >> >>> > 2^32
> >> >>> > >> > ranges
> >> >>> > >> >     > at a
> >> >>> > >> >     > >>  > time. the part i like about this is letting
> people
> >> >>> have a
> >> >>> > >> large
> >> >>> > >> >     > swath of
> >> >>> > >> >     > >>  > numbers with one registration so they dont have
> to
> >> come
> >> >>> > back
> >> >>> > >> > for
> >> >>> > >> >     > every
> >> >>> > >> >     > >>  > single plugin/header they want to "reserve".
> >> >>> > >> >     > >>  >
> >> >>> > >> >     > >>  >
> >> >>> > >> >     > >>  > On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover <
> >> >>> > >> >     > roger.hoo...@gmail.com>
> >> >>> > >> >     > >>  > wrote:
> >> >>> > >> >     > >>  >
> >> >>> > >> >     > >>  > > Since some of the debate has been about
> overhead +
> >> >>> > >> > performance, I'm
> >> >>> > >> >     > >>  > > wondering if we have considered a varint
> encoding
> >> (
> >> >>> > >> >     > >>  > > https://developers.google.com/
> >> protocol-buffers/docs/
> >> >>> > >> >     > encoding#varints)
> >> >>> > >> >     > >>  > for
> >> >>> > >> >     > >>  > > the header length field (int32 in the proposal)
> >> and
> >> >>> for
> >> >>> > >> > header
> >> >>> > >> >     > ids? If
> >> >>> > >> >     > >>  > you
> >> >>> > >> >     > >>  > > don't use headers, the overhead would be a
> single
> >> >>> byte
> >> >>> > and
> >> >>> > >> > for each
> >> >>> > >> >     > >>  > header
> >> >>> > >> >     > >>  > > id < 128 would also need only a single byte?
> >> >>> > >> >     > >>  > >
> >> >>> > >> >     > >>  > >
> >> >>> > >> >     > >>  > >
> >> >>> > >> >     > >>  > > On Wed, Nov 9, 2016 at 6:43 AM, radai <
> >> >>> > >> > radai.rosenbl...@gmail.com>
> >> >>> > >> >     > >>  > wrote:
> >> >>> > >> >     > >>  > >
> >> >>> > >> >     > >>  > > > @magnus - and very dangerous (youre
> essentially
> >> >>> > >> > downloading and
> >> >>> > >> >     > >>  > executing
> >> >>> > >> >     > >>  > > > arbitrary code off the internet on your
> servers
> >> ...
> >> >>> > bad
> >> >>> > >> > idea
> >> >>> > >> >     > without
> >> >>> > >> >     > >>  a
> >> >>> > >> >     > >>  > > > sandbox, even with)
> >> >>> > >> >     > >>  > > >
> >> >>> > >> >     > >>  > > > as for it being a purely administrative task
> - i
> >> >>> > >> disagree.
> >> >>> > >> >     > >>  > > >
> >> >>> > >> >     > >>  > > > i wish it would, really, because then my
> earlier
> >> >>> > point on
> >> >>> > >> > the
> >> >>> > >> >     > >>  > complexity
> >> >>> > >> >     > >>  > > of
> >> >>> > >> >     > >>  > > > the remapping process would be invalid, but
> at
> >> >>> > linkedin,
> >> >>> > >> > for
> >> >>> > >> >     > example,
> >> >>> > >> >     > >>  > we
> >> >>> > >> >     > >>  > > > (the team im in) run kafka as a service. we
> dont
> >> >>> > really
> >> >>> > >> > know
> >> >>> > >> >     > what our
> >> >>> > >> >     > >>  > > users
> >> >>> > >> >     > >>  > > > (developing applications that use kafka) are
> up
> >> to
> >> >>> at
> >> >>> > any
> >> >>> > >> > given
> >> >>> > >> >     > >>  moment.
> >> >>> > >> >     > >>  > > it
> >> >>> > >> >     > >>  > > > is very possible (given the existance of
> headers
> >> >>> and a
> >> >>> > >> >     > corresponding
> >> >>> > >> >     > >>  > > plugin
> >> >>> > >> >     > >>  > > > ecosystem) for some application to "equip"
> their
> >> >>> > >> producers
> >> >>> > >> > and
> >> >>> > >> >     > >>  > consumers
> >> >>> > >> >     > >>  > > > with the required plugin without us knowing.
> i
> >> dont
> >> >>> > mean
> >> >>> > >> > to imply
> >> >>> > >> >     > >>  thats
> >> >>> > >> >     > >>  > > > bad, i just want to make the point that its
> not
> >> as
> >> >>> > simple
> >> >>> > >> >     > keeping it
> >> >>> > >> >     > >>  in
> >> >>> > >> >     > >>  > > > sync across a large-enough organization.
> >> >>> > >> >     > >>  > > >
> >> >>> > >> >     > >>  > > >
> >> >>> > >> >     > >>  > > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus
> Edenhill
> >> <
> >> >>> > >> >     > mag...@edenhill.se>
> >> >>> > >> >     > >>  > > > wrote:
> >> >>> > >> >     > >>  > > >
> >> >>> > >> >     > >>  > > > > I think there is a piece missing in the
> >> Strings
> >> >>> > >> > discussion,
> >> >>> > >> >     > where
> >> >>> > >> >     > >>  > > > > pro-Stringers
> >> >>> > >> >     > >>  > > > > reason that by providing unique string
> >> >>> identifiers
> >> >>> > for
> >> >>> > >> > each
> >> >>> > >> >     > header
> >> >>> > >> >     > >>  > > > > everything will just
> >> >>> > >> >     > >>  > > > > magically work for all parts of the stream
> >> >>> pipeline.
> >> >>> > >> >     > >>  > > > >
> >> >>> > >> >     > >>  > > > > But the strings dont mean anything by
> >> themselves,
> >> >>> > and
> >> >>> > >> > while we
> >> >>> > >> >     > >>  could
> >> >>> > >> >     > >>  > > > > probably envision
> >> >>> > >> >     > >>  > > > > some auto plugin loader that downloads,
> >> compiles,
> >> >>> > links
> >> >>> > >> > and
> >> >>> > >> >     > runs
> >> >>> > >> >     > >>  > > plugins
> >> >>> > >> >     > >>  > > > > on-demand
> >> >>> > >> >     > >>  > > > > as soon as they're seen by a consumer, I
> dont
> >> >>> really
> >> >>> > >> see
> >> >>> > >> > a
> >> >>> > >> >     > use-case
> >> >>> > >> >     > >>  > for
> >> >>> > >> >     > >>  > > > > something
> >> >>> > >> >     > >>  > > > > so dynamic (and fragile) in practice.
> >> >>> > >> >     > >>  > > > >
> >> >>> > >> >     > >>  > > > > In the real world an application will be
> >> >>> configured
> >> >>> > >> with
> >> >>> > >> > a set
> >> >>> > >> >     > of
> >> >>> > >> >     > >>  > > plugins
> >> >>> > >> >     > >>  > > > > to either add (producer)
> >> >>> > >> >     > >>  > > > > or read (consumer) headers.
> >> >>> > >> >     > >>  > > > > This is an administrative task based on
> what
> >> >>> > features a
> >> >>> > >> > client
> >> >>> > >> >     > >>  > > > > needs/provides and results in
> >> >>> > >> >     > >>  > > > > some sort of configuration to enable and
> >> >>> configure
> >> >>> > the
> >> >>> > >> > desired
> >> >>> > >> >     > >>  > plugins.
> >> >>> > >> >     > >>  > > > >
> >> >>> > >> >     > >>  > > > > Since this needs to be kept somewhat in
> sync
> >> >>> across
> >> >>> > an
> >> >>> > >> >     > organisation
> >> >>> > >> >     > >>  > > > (there
> >> >>> > >> >     > >>  > > > > is no point in having producers
> >> >>> > >> >     > >>  > > > > add headers no consumers will read, and
> vice
> >> >>> versa),
> >> >>> > >> the
> >> >>> > >> > added
> >> >>> > >> >     > >>  > > complexity
> >> >>> > >> >     > >>  > > > > of assigning an id namespace
> >> >>> > >> >     > >>  > > > > for each plugin as it is being configured
> >> should
> >> >>> be
> >> >>> > >> > tolerable.
> >> >>> > >> >     > >>  > > > >
> >> >>> > >> >     > >>  > > > >
> >> >>> > >> >     > >>  > > > > /Magnus
> >> >>> > >> >     > >>  > > > >
> >> >>> > >> >     > >>  > > > > 2016-11-09 13:06 GMT+01:00 Michael Pearce <
> >> >>> > >> >     > michael.pea...@ig.com>:
> >> >>> > >> >     > >>  > > > >
> >> >>> > >> >     > >>  > > > > > Just following/catching up on what seems
> to
> >> be
> >> >>> an
> >> >>> > >> > active
> >> >>> > >> >     > night :)
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > @Radai sorry if it may seem obvious but
> what
> >> >>> does
> >> >>> > MD
> >> >>> > >> > stand
> >> >>> > >> >     > for?
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > My take on String vs Int:
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > I will state first I am pro Int (16 or
> 32).
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > I do though playing devils advocate see a
> >> big
> >> >>> plus
> >> >>> > >> > with the
> >> >>> > >> >     > >>  > argument
> >> >>> > >> >     > >>  > > of
> >> >>> > >> >     > >>  > > > > > String keys, this is around integrating
> >> into an
> >> >>> > >> > existing
> >> >>> > >> >     > >>  > eco-system.
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > As many other systems use String based
> >> headers
> >> >>> > >> (Flume,
> >> >>> > >> > JMS)
> >> >>> > >> >     > it
> >> >>> > >> >     > >>  > makes
> >> >>> > >> >     > >>  > > > it
> >> >>> > >> >     > >>  > > > > > much easier for these to be
> >> >>> > incorporated/integrated
> >> >>> > >> > into.
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > How with Int based headers could we
> provide
> >> a
> >> >>> > >> > way/guidence to
> >> >>> > >> >     > >>  make
> >> >>> > >> >     > >>  > > this
> >> >>> > >> >     > >>  > > > > > integration simple / easy with transition
> >> flows
> >> >>> > over
> >> >>> > >> to
> >> >>> > >> >     > kafka?
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > * tough luck buddy you're on your own
> >> >>> > >> >     > >>  > > > > > * simply hash the string into int code
> and
> >> hope
> >> >>> > for
> >> >>> > >> no
> >> >>> > >> >     > collisions
> >> >>> > >> >     > >>  > > (how
> >> >>> > >> >     > >>  > > > to
> >> >>> > >> >     > >>  > > > > > convert back though?)
> >> >>> > >> >     > >>  > > > > > * http2 style as mentioned by nacho.
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > cheers,
> >> >>> > >> >     > >>  > > > > > Mike
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > ________________________________________
> >> >>> > >> >     > >>  > > > > > From: radai <radai.rosenbl...@gmail.com>
> >> >>> > >> >     > >>  > > > > > Sent: Wednesday, November 9, 2016 8:12 AM
> >> >>> > >> >     > >>  > > > > > To: dev@kafka.apache.org
> >> >>> > >> >     > >>  > > > > > Subject: Re: [DISCUSS] KIP-82 - Add
> Record
> >> >>> Headers
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > thinking about it some more, the best
> way to
> >> >>> > transmit
> >> >>> > >> > the
> >> >>> > >> >     > header
> >> >>> > >> >     > >>  > > > > remapping
> >> >>> > >> >     > >>  > > > > > data to consumers would be to put it in
> the
> >> MD
> >> >>> > >> response
> >> >>> > >> >     > payload,
> >> >>> > >> >     > >>  so
> >> >>> > >> >     > >>  > > > maybe
> >> >>> > >> >     > >>  > > > > > it should be discussed now.
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > On Wed, Nov 9, 2016 at 12:09 AM, radai <
> >> >>> > >> >     > >>  radai.rosenbl...@gmail.com
> >> >>> > >> >     > >>  > >
> >> >>> > >> >     > >>  > > > > wrote:
> >> >>> > >> >     > >>  > > > > >
> >> >>> > >> >     > >>  > > > > > > im not opposed to the idea of namespace
> >> >>> mapping.
> >> >>> > >> all
> >> >>> > >> > im
> >> >>> > >> >     > saying
> >> >>> > >> >     > >>  is
> >> >>> > >> >     > >>  > > > that
> >> >>> > >> >     > >>  > > > > > its
> >> >>> > >> >     > >>  > > > > > > not part of the "mvp" and, since it
> >> requires
> >> >>> no
> >> >>> > >> wire
> >> >>> > >> > format
> >> >>> > >> >     > >>  > change,
> >> >>> > >> >     > >>  > > > can
> >> >>> > >> >     > >>  > > > > > > always be added later.
> >> >>> > >> >     > >>  > > > > > > also, its not as simple as just
> >> configuring
> >> >>> MM
> >> >>> > to
> >> >>> > >> do
> >> >>> > >> > the
> >> >>> > >> >     > >>  > transform:
> >> >>> > >> >     > >>  > > > > lets
> >> >>> > >> >     > >>  > > > > > > say i've implemented large message
> >> support as
> >> >>> > >> > {666,1} and
> >> >>> > >> >     > on
> >> >>> > >> >     > >>  some
> >> >>> > >> >     > >>  > > > > mirror
> >> >>> > >> >     > >>  > > > > > > target cluster its been remapped to
> >> {999,1}.
> >> >>> the
> >> >>> > >> > consumer
> >> >>> > >> >     > >>  plugin
> >> >>> > >> >     > >>  > > code
> >> >>> > >> >     > >>  > > > > > would
> >> >>> > >> >     > >>  > > > > > > also need to be told to look for the
> large
> >> >>> > message
> >> >>> > >> > "part X
> >> >>> > >> >     > of
> >> >>> > >> >     > >>  Y"
> >> >>> > >> >     > >>  > > > header
> >> >>> > >> >     > >>  > > > > > > under {999,1}. doable, but tricky.
> >> >>> > >> >     > >>  > > > > > >
> >> >>> > >> >     > >>  > > > > > > On Tue, Nov 8, 2016 at 10:29 PM, Gwen
> >> >>> Shapira <
> >> >>> > >> >     > >>  g...@confluent.io
> >> >>> > >> >     > >>  > >
> >> >>> > >> >     > >>  > > > > wrote:
> >> >>> > >> >     > >>  > > > > > >
> >> >>> > >> >     > >>  > > > > > >> While you can do whatever you want
> with a
> >> >>> > >> namespace
> >> >>> > >> > and
> >> >>> > >> >     > your
> >> >>> > >> >     > >>  > code,
> >> >>> > >> >     > >>  > > > > > >> what I'd expect is for each app to
> >> >>> namespaces
> >> >>> > >> >     > configurable...
> >> >>> > >> >     > >>  > > > > > >>
> >> >>> > >> >     > >>  > > > > > >> So if I accidentally used 666 for my
> HR
> >> >>> > >> department,
> >> >>> > >> > and
> >> >>> > >> >     > still
> >> >>> > >> >     > >>  > want
> >> >>> > >> >     > >>  > > > to
> >> >>> > >> >     > >>  > > > > > >> run RadaiApp, I can config
> "namespace=42"
> >> >>> for
> >> >>> > >> > RadaiApp and
> >> >>> > >> >     > >>  > > > everything
> >> >>> > >> >     > >>  > > > > > >> will look normal.
> >> >>> > >> >     > >>  > > > > > >>
> >> >>> > >> >     > >>  > > > > > >> This means you only need to sync usage
> >> >>> inside
> >> >>> > your
> >> >>> > >> > own
> >> >>> > >> >     > >>  > > organization.
> >> >>> > >> >     > >>  > > > > > >> Still hard, but somewhat easier than
> >> syncing
> >> >>> > with
> >> >>> > >> > the
> >> >>> > >> >     > entire
> >> >>> > >> >     > >>  > > world.
> >> >>> > >> >     > >>  > > > > > >>
> >> >>> > >> >     > >>  > > > > > >> On Tue, Nov 8, 2016 at 10:07 PM,
> radai <
> >> >>> > >> >     > >>  > > radai.rosenbl...@gmail.com>
> >> >>> > >> >     > >>  > > > > > >> wrote:
> >> >>> > >> >     > >>  > > > > > >> > and we can start with {namespace,
> id}
> >> and
> >> >>> no
> >> >>> > >> > re-mapping
> >> >>> > >> >     > >>  > support
> >> >>> > >> >     > >>  > > > and
> >> >>> > >> >     > >>  > > > > > >> always
> >> >>> > >> >     > >>  > > > > > >> > add it later on if/when collisions
> >> >>> actually
> >> >>> > >> > happen (i
> >> >>> > >> >     > dont
> >> >>> > >> >     > >>  > think
> >> >>> > >> >     > >>  > > > > > they'd
> >> >>> > >> >     > >>  > > > > > >> be
> >> >>> > >> >     > >>  > > > > > >> > a problem).
> >> >>> > >> >     > >>  > > > > > >> >
> >> >>> > >> >     > >>  > > > > > >> > every interested party (so orgs or
> >> >>> > individuals)
> >> >>> > >> > could
> >> >>> > >> >     > then
> >> >>> > >> >     > >>  > > > register
> >> >>> > >> >     > >>  > > > > a
> >> >>> > >> >     > >>  > > > > > >> > prefix (0 = reserved, 1 = confluent
> ...
> >> >>> 666
> >> >>> > = me
> >> >>> > >> > :-) )
> >> >>> > >> >     > and
> >> >>> > >> >     > >>  do
> >> >>> > >> >     > >>  > > > > whatever
> >> >>> > >> >     > >>  > > > > > >> with
> >> >>> > >> >     > >>  > > > > > >> > the 2nd ID - so once linkedin
> >> registers,
> >> >>> say
> >> >>> > 3,
> >> >>> > >> > then
> >> >>> > >> >     > >>  linkedin
> >> >>> > >> >     > >>  > > devs
> >> >>> > >> >     > >>  > > > > are
> >> >>> > >> >     > >>  > > > > > >> free
> >> >>> > >> >     > >>  > > > > > >> > to use {3, *} with a reasonable
> >> >>> expectation
> >> >>> > to
> >> >>> > >> to
> >> >>> > >> >     > collide
> >> >>> > >> >     > >>  with
> >> >>> > >> >     > >>  > > > > > anything
> >> >>> > >> >     > >>  > > > > > >> > else. further partitioning of that *
> >> >>> becomes
> >> >>> > >> > linkedin's
> >> >>> > >> >     > >>  > problem,
> >> >>> > >> >     > >>  > > > but
> >> >>> > >> >     > >>  > > > > > the
> >> >>> > >> >     > >>  > > > > > >> > "upstream registration" of a
> namespace
> >> >>> only
> >> >>> > has
> >> >>> > >> to
> >> >>> > >> >     > happen
> >> >>> > >> >     > >>  > once.
> >> >>> > >> >     > >>  > > > > > >> >
> >> >>> > >> >     > >>  > > > > > >> > On Tue, Nov 8, 2016 at 9:03 PM,
> James
> >> >>> Cheng <
> >> >>> > >> >     > >>  > > wushuja...@gmail.com
> >> >>> > >> >     > >>  > > > >
> >> >>> > >> >     > >>  > > > > > >> wrote:
> >> >>> > >> >     > >>  > > > > > >> >
> >> >>> > >> >     > >>  > > > > > >> >>
> >> >>> > >> >     > >>  > > > > > >> >>
> >> >>> > >> >     > >>  > > > > > >> >>
> >> >>> > >> >     > >>  > > > > > >> >> > On Nov 8, 2016, at 5:54 PM, Gwen
> >> >>> Shapira <
> >> >>> > >> >     > >>  > g...@confluent.io>
> >> >>> > >> >     > >>  > > > > > wrote:
> >> >>> > >> >     > >>  > > > > > >> >> >
> >> >>> > >> >     > >>  > > > > > >> >> > Thank you so much for this clear
> and
> >> >>> fair
> >> >>> > >> > summary of
> >> >>> > >> >     > the
> >> >>> > >> >     > >>  > > > > arguments.
> >> >>> > >> >     > >>  > > > > > >> >> >
> >> >>> > >> >     > >>  > > > > > >> >> > I'm in favor of ints. Not a
> >> >>> deal-breaker,
> >> >>> > but
> >> >>> > >> > in
> >> >>> > >> >     > favor.
> >> >>> > >> >     > >>  > > > > > >> >> >
> >> >>> > >> >     > >>  > > > > > >> >> > Even more in favor of Magnus's
> >> >>> > decentralized
> >> >>> > >> >     > suggestion
> >> >>> > >> >     > >>  > with
> >> >>> > >> >     > >>  > > > > > Roger's
> >> >>> > >> >     > >>  > > > > > >> >> > tweak: add a namespace for
> headers.
> >> >>> This
> >> >>> > will
> >> >>> > >> > allow
> >> >>> > >> >     > each
> >> >>> > >> >     > >>  > app
> >> >>> > >> >     > >>  > > to
> >> >>> > >> >     > >>  > > > > > just
> >> >>> > >> >     > >>  > > > > > >> >> > use whatever IDs it wants
> >> internally,
> >> >>> and
> >> >>> > >> then
> >> >>> > >> > let
> >> >>> > >> >     > the
> >> >>> > >> >     > >>  > admin
> >> >>> > >> >     > >>  > > > > > >> deploying
> >> >>> > >> >     > >>  > > > > > >> >> > the app figure out an available
> >> >>> namespace
> >> >>> > ID
> >> >>> > >> > for the
> >> >>> > >> >     > app
> >> >>> > >> >     > >>  to
> >> >>> > >> >     > >>  > > > live
> >> >>> > >> >     > >>  > > > > > in.
> >> >>> > >> >     > >>  > > > > > >> >> > So io.confluent.schema-registry
> can
> >> be
> >> >>> > >> > namespace
> >> >>> > >> >     > 0x01 on
> >> >>> > >> >     > >>  my
> >> >>> > >> >     > >>  > > > > > >> deployment
> >> >>> > >> >     > >>  > > > > > >> >> > and 0x57 on yours, and the poor
> guys
> >> >>> > >> > developing the
> >> >>> > >> >     > app
> >> >>> > >> >     > >>  > don't
> >> >>> > >> >     > >>  > > > > need
> >> >>> > >> >     > >>  > > > > > to
> >> >>> > >> >     > >>  > > > > > >> >> > worry about that.
> >> >>> > >> >     > >>  > > > > > >> >> >
> >> >>> > >> >     > >>  > > > > > >> >>
> >> >>> > >> >     > >>  > > > > > >> >> Gwen, if I understand your example
> >> >>> right, an
> >> >>> > >> >     > application
> >> >>> > >> >     > >>  > > deployer
> >> >>> > >> >     > >>  > > > > > might
> >> >>> > >> >     > >>  > > > > > >> >> decide to use 0x01 in one
> deployment,
> >> and
> >> >>> > that
> >> >>> > >> > means
> >> >>> > >> >     > that
> >> >>> > >> >     > >>  > once
> >> >>> > >> >     > >>  > > > the
> >> >>> > >> >     > >>  > > > > > >> message
> >> >>> > >> >     > >>  > > > > > >> >> is written into the broker, it
> will be
> >> >>> > saved on
> >> >>> > >> > the
> >> >>> > >> >     > broker
> >> >>> > >> >     > >>  > with
> >> >>> > >> >     > >>  > > > > that
> >> >>> > >> >     > >>  > > > > > >> >> specific namespace (0x01).
> >> >>> > >> >     > >>  > > > > > >> >>
> >> >>> > >> >     > >>  > > > > > >> >> If you were to mirror that message
> >> into
> >> >>> > another
> >> >>> > >> >     > cluster,
> >> >>> > >> >     > >>  the
> >> >>> > >> >     > >>  > > 0x01
> >> >>> > >> >     > >>  > > > > > would
> >> >>> > >> >     > >>  > > > > > >> >> accompany the message, right? What
> if
> >> the
> >> >>> > >> > deployers of
> >> >>> > >> >     > the
> >> >>> > >> >     > >>  > same
> >> >>> > >> >     > >>  > > > app
> >> >>> > >> >     > >>  > > > > > in
> >> >>> > >> >     > >>  > > > > > >> the
> >> >>> > >> >     > >>  > > > > > >> >> other cluster uses 0x57? They won't
> >> >>> > understand
> >> >>> > >> > each
> >> >>> > >> >     > other?
> >> >>> > >> >     > >>  > > > > > >> >>
> >> >>> > >> >     > >>  > > > > > >> >> I'm not sure that's an avoidable
> >> >>> problem. I
> >> >>> > >> > think it
> >> >>> > >> >     > simply
> >> >>> > >> >     > >>  > > means
> >> >>> > >> >     > >>  > > > > > that
> >> >>> > >> >     > >>  > > > > > >> in
> >> >>> > >> >     > >>  > > > > > >> >> order to share data, you have to
> also
> >> >>> have a
> >> >>> > >> > shared
> >> >>> > >> >     > (agreed
> >> >>> > >> >     > >>  > > upon)
> >> >>> > >> >     > >>  > > > > > >> >> understanding of what the
> namespaces
> >> >>> mean.
> >> >>> > >> Which
> >> >>> > >> > I
> >> >>> > >> >     > think
> >> >>> > >> >     > >>  > makes
> >> >>> > >> >     > >>  > > > > sense,
> >> >>> > >> >     > >>  > > > > > >> >> because the alternate (sharing
> >> *nothing*
> >> >>> at
> >> >>> > >> all)
> >> >>> > >> > would
> >> >>> > >> >     > mean
> >> >>> > >> >     > >>  > > that
> >> >>> > >> >     > >>  > > > > > there
> >> >>> > >> >     > >>  > > > > > >> >> would be no way to understand each
> >> other.
> >> >>> > >> >     > >>  > > > > > >> >>
> >> >>> > >> >     > >>  > > > > > >> >> -James
> >> >>> > >> >     > >>  > > > > > >> >>
> >> >>> > >> >     > >>  > > > > > >> >> > Gwen
> >> >>> > >> >     > >>  > > > > > >> >> >
> >> >>> > >> >     > >>  > > > > > >> >> > On Tue, Nov 8, 2016 at 4:23 PM,
> >> radai <
> >> >>> > >> >     > >>  > > > > radai.rosenbl...@gmail.com>
> >> >>> > >> >     > >>  > > > > > >> >> wrote:
> >> >>> > >> >     > >>  > > > > > >> >> >> +1 for sean's document. it
> covers
> >> >>> pretty
> >> >>> > >> much
> >> >>> > >> > all
> >> >>> > >> >     > the
> >> >>> > >> >     > >>  > > > trade-offs
> >> >>> > >> >     > >>  > > > > > and
> >> >>> > >> >     > >>  > > > > > >> >> >> provides concrete figures to
> argue
> >> >>> about
> >> >>> > :-)
> >> >>> > >> >     > >>  > > > > > >> >> >> (nit-picking - used the same
> xkcd
> >> >>> twice,
> >> >>> > >> also
> >> >>> > >> > trove
> >> >>> > >> >     > has
> >> >>> > >> >     > >>  > been
> >> >>> > >> >     > >>  > > > > > >> superceded
> >

Re: [DISCUSS] KIP-82 - Add Record Headers

Reply via email to