Updated KIP according to Jun's comment and included changes to TMR.

On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <j...@confluent.io> wrote:

> Hi, Allen,
>
> A couple of minor comments on the KIP.
>
> 1. The version of the broker JSON string says 2. It should be 3.
>
> 2. The new version of UpdateMetadataRequest should be 2, instead of 1.
> Could you include the full wire protocol of version 2 of
> UpdateMetadataRequest and highlight the changed part?
>
> Thanks,
>
> Jun
>
> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <allenxw...@gmail.com> wrote:
>
> > Jun and I had a chance to discuss it in a meeting and it is agreed to
> > change the TMR in a different patch.
> >
> > I can change the KIP to include rack in TMR. The essential change is to
> add
> > rack into class BrokerEndPoint and make TMR version aware.
> >
> >
> >
> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
> > aaurad...@linkedin.com.invalid> wrote:
> >
> > > Jun/Allen -
> > >
> > > Did we ever actually agree on whether we should evolve the TMR to
> include
> > > rack info or not?
> > > I don't feel strongly about it but I if it's the right thing to do we
> > > should probably do it in this KIP (can be a separate patch).. it isn't
> a
> > > large change.
> > >
> > > Aditya
> > >
> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <allenxw...@gmail.com>
> > wrote:
> > >
> > > > Added the rolling upgrade instruction in the KIP, similar to those in
> > > 0.9.0
> > > > release notes.
> > > >
> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <allenxw...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > The reason that TopicMetadataResponse is not included in the KIP is
> > > that
> > > > > it currently is not version aware . So we need to introduce version
> > to
> > > it
> > > > > in order to make sure backward compatibility. It seems to me a big
> > > > change.
> > > > > Do we want to couple it with this KIP? Do we need to further
> discuss
> > > what
> > > > > information to include in the new version besides rack? For
> example,
> > > > should
> > > > > we include broker security protocol in TopicMetadataResponse?
> > > > >
> > > > > The other option is to make it a separate KIP to make
> > > > > TopicMetadataResponse version aware and decide what to include, and
> > > make
> > > > > this KIP focus on the rack aware algorithm, admin tools  and
> related
> > > > > changes to inter-broker protocol .
> > > > >
> > > > > Thanks,
> > > > > Allen
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <j...@confluent.io> wrote:
> > > > >
> > > > >> Allen,
> > > > >>
> > > > >> Thanks for the proposal. A few comments.
> > > > >>
> > > > >> 1. Since this KIP changes the inter broker communication protocol
> > > > >> (UpdateMetadataRequest), we will need to document the upgrade path
> > > > >> (similar
> > > > >> to what's described in
> > > > >> http://kafka.apache.org/090/documentation.html#upgrade).
> > > > >>
> > > > >> 2. It might be useful to include the rack info of the broker in
> > > > >> TopicMetadataResponse. This can be useful for administrative
> tasks,
> > as
> > > > >> well
> > > > >> as read affinity in the future.
> > > > >>
> > > > >> Jun
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <allenxw...@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >> > If there are no more comments I would like to call for a vote.
> > > > >> >
> > > > >> >
> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
> > allenxw...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > KIP is updated with more details and how to handle the
> situation
> > > > where
> > > > >> > > rack information is incomplete.
> > > > >> > >
> > > > >> > > In the situation where rack information is incomplete, but we
> > want
> > > > to
> > > > >> > > continue with the assignment, I have suggested to ignore all
> > rack
> > > > >> > > information and fallback to original algorithm. The reason is
> > > > >> explained
> > > > >> > > below:
> > > > >> > >
> > > > >> > > The other options are to assume that the broker without the
> rack
> > > > >> belong
> > > > >> > to
> > > > >> > > its own unique rack, or they belong to one "default" rack.
> > Either
> > > > way
> > > > >> we
> > > > >> > > choose, it is highly likely to result in uneven number of
> > brokers
> > > in
> > > > >> > racks,
> > > > >> > > and it is quite possible that the "made up" racks will have
> much
> > > > fewer
> > > > >> > > number of brokers. As I explained in the KIP, uneven number of
> > > > >> brokers in
> > > > >> > > racks will lead to uneven distribution of replicas among
> brokers
> > > > (even
> > > > >> > > though the leader distribution is still even). The brokers in
> > the
> > > > rack
> > > > >> > that
> > > > >> > > has fewer number of brokers will get more replicas per broker
> > than
> > > > >> > brokers
> > > > >> > > in other racks.
> > > > >> > >
> > > > >> > > Given this fact and the replica assignment produced will be
> > > > incorrect
> > > > >> > > anyway from rack aware point of view, ignoring all rack
> > > information
> > > > >> and
> > > > >> > > fallback to the original algorithm is not a bad choice since
> it
> > > will
> > > > >> at
> > > > >> > > least have a better guarantee of replica distribution.
> > > > >> > >
> > > > >> > > Also for command line tools it gives user a choice if for any
> > > reason
> > > > >> they
> > > > >> > > want to ignore rack information and fallback to the original
> > > > >> algorithm.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
> > allenxw...@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > >> I am busy with some time pressing issues for the last few
> > days. I
> > > > >> will
> > > > >> > >> think about how the incomplete rack information will affect
> the
> > > > >> balance
> > > > >> > and
> > > > >> > >> update the KIP by early next week.
> > > > >> > >>
> > > > >> > >> Thanks,
> > > > >> > >> Allen
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
> > n...@confluent.io
> > > >
> > > > >> > wrote:
> > > > >> > >>
> > > > >> > >>> Few suggestions on improving the KIP
> > > > >> > >>>
> > > > >> > >>> *If some brokers have rack, and some do not, the algorithm
> > will
> > > > >> thrown
> > > > >> > an
> > > > >> > >>> > exception. This is to prevent incorrect assignment caused
> by
> > > > user
> > > > >> > >>> error.*
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>> In the KIP, can you clearly state the user-facing behavior
> > when
> > > > some
> > > > >> > >>> brokers have rack information and some don't. Which actions
> > and
> > > > >> > requests
> > > > >> > >>> will error out and how?
> > > > >> > >>>
> > > > >> > >>> *Even distribution of partition leadership among brokers*
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>> There is some information about arranging the sorted broker
> > list
> > > > >> > >>> interlaced
> > > > >> > >>> with rack ids. Can you describe the changes to the current
> > > > algorithm
> > > > >> > in a
> > > > >> > >>> little more detail? How does this interlacing work if only a
> > > > subset
> > > > >> of
> > > > >> > >>> brokers have the rack id configured? Does this still work if
> > > > uneven
> > > > >> #
> > > > >> > of
> > > > >> > >>> brokers are assigned to each rack? It might work, I'm
> looking
> > > for
> > > > >> more
> > > > >> > >>> details on the changes, since it will affect the behavior
> seen
> > > by
> > > > >> the
> > > > >> > >>> user
> > > > >> > >>> - imbalance on either the leaders or data or both.
> > > > >> > >>>
> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> > > > >> > aaurad...@linkedin.com>
> > > > >> > >>> wrote:
> > > > >> > >>>
> > > > >> > >>> > I think this sounds reasonable. Anyone else have comments?
> > > > >> > >>> >
> > > > >> > >>> > Aditya
> > > > >> > >>> >
> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> > > > allenxw...@gmail.com
> > > > >> >
> > > > >> > >>> wrote:
> > > > >> > >>> >
> > > > >> > >>> > > During the discussion in the hangout, it was mentioned
> > that
> > > it
> > > > >> > would
> > > > >> > >>> be
> > > > >> > >>> > > desirable that consumers know the rack information of
> the
> > > > >> brokers
> > > > >> > so
> > > > >> > >>> that
> > > > >> > >>> > > they can consume from the broker in the same rack to
> > reduce
> > > > >> > latency.
> > > > >> > >>> As I
> > > > >> > >>> > > understand this will only be beneficial if consumer can
> > > > consume
> > > > >> > from
> > > > >> > >>> any
> > > > >> > >>> > > broker in ISR, which is not possible now.
> > > > >> > >>> > >
> > > > >> > >>> > > I suggest we skip the change to TMR. Once the change is
> > made
> > > > to
> > > > >> > >>> consumer
> > > > >> > >>> > to
> > > > >> > >>> > > be able to consume from any broker in ISR, the rack
> > > > information
> > > > >> can
> > > > >> > >>> be
> > > > >> > >>> > > added to TMR.
> > > > >> > >>> > >
> > > > >> > >>> > > Another thing I want to confirm is  command line
> > behavior. I
> > > > >> think
> > > > >> > >>> the
> > > > >> > >>> > > desirable default behavior is to fail fast on command
> line
> > > for
> > > > >> > >>> incomplete
> > > > >> > >>> > > rack mapping. The error message can include further
> > > > instruction
> > > > >> > that
> > > > >> > >>> > tells
> > > > >> > >>> > > the user to add an extra argument (like
> > > > >> "--allow-partial-rackinfo")
> > > > >> > >>> to
> > > > >> > >>> > > suppress the error and do an imperfect rack aware
> > > assignment.
> > > > If
> > > > >> > the
> > > > >> > >>> > > default behavior is to allow incomplete mapping, the
> error
> > > can
> > > > >> > still
> > > > >> > >>> be
> > > > >> > >>> > > easily missed.
> > > > >> > >>> > >
> > > > >> > >>> > > The affected command line tools are TopicCommand and
> > > > >> > >>> > > ReassignPartitionsCommand.
> > > > >> > >>> > >
> > > > >> > >>> > > Thanks,
> > > > >> > >>> > > Allen
> > > > >> > >>> > >
> > > > >> > >>> > >
> > > > >> > >>> > >
> > > > >> > >>> > >
> > > > >> > >>> > >
> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> > > > >> > >>> > aaurad...@linkedin.com>
> > > > >> > >>> > > wrote:
> > > > >> > >>> > >
> > > > >> > >>> > > > Hi Allen,
> > > > >> > >>> > > >
> > > > >> > >>> > > > For TopicMetadataResponse to understand version, you
> can
> > > > bump
> > > > >> up
> > > > >> > >>> the
> > > > >> > >>> > > > request version itself. Based on the version of the
> > > request,
> > > > >> the
> > > > >> > >>> > response
> > > > >> > >>> > > > can be appropriately serialized. It shouldn't be a
> huge
> > > > >> change.
> > > > >> > For
> > > > >> > >>> > > > example: We went through something similar for
> > > > ProduceRequest
> > > > >> > >>> recently
> > > > >> > >>> > (
> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
> > > > >> > >>> > > > I guess the reason protocol information is not
> included
> > in
> > > > the
> > > > >> > TMR
> > > > >> > >>> is
> > > > >> > >>> > > > because the topic itself is independent of any
> > particular
> > > > >> > protocol
> > > > >> > >>> (SSL
> > > > >> > >>> > > vs
> > > > >> > >>> > > > Plaintext). Having said that, I'm not sure we even
> need
> > > rack
> > > > >> > >>> > information
> > > > >> > >>> > > in
> > > > >> > >>> > > > TMR. What usecase were you thinking of initially?
> > > > >> > >>> > > >
> > > > >> > >>> > > > For 1 - I'd be fine with adding an option to the
> command
> > > > line
> > > > >> > tools
> > > > >> > >>> > that
> > > > >> > >>> > > > check rack assignment. For e.g. "--strict-assignment"
> or
> > > > >> > something
> > > > >> > >>> > > similar.
> > > > >> > >>> > > >
> > > > >> > >>> > > > Aditya
> > > > >> > >>> > > >
> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> > > > >> > allenxw...@gmail.com>
> > > > >> > >>> > > wrote:
> > > > >> > >>> > > >
> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a
> > look.
> > > > One
> > > > >> > >>> thing I
> > > > >> > >>> > > have
> > > > >> > >>> > > > > changed is removing the proposal to add rack to
> > > > >> > >>> > TopicMetadataResponse.
> > > > >> > >>> > > > The
> > > > >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> > > > >> > >>> TopicMetadataResponse
> > > > >> > >>> > does
> > > > >> > >>> > > > not
> > > > >> > >>> > > > > understand version. I don't see a way to include
> rack
> > > > >> without
> > > > >> > >>> > breaking
> > > > >> > >>> > > > old
> > > > >> > >>> > > > > version of clients. That's probably why secure
> > protocol
> > > is
> > > > >> not
> > > > >> > >>> > included
> > > > >> > >>> > > > in
> > > > >> > >>> > > > > the TopicMetadataResponse either. I think it will
> be a
> > > > much
> > > > >> > >>> bigger
> > > > >> > >>> > > change
> > > > >> > >>> > > > > to include rack in TopicMetadataResponse.
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > For 1, my concern is that doing rack aware
> assignment
> > > > >> without
> > > > >> > >>> > complete
> > > > >> > >>> > > > > broker to rack mapping will result in assignment
> that
> > is
> > > > not
> > > > >> > rack
> > > > >> > >>> > aware
> > > > >> > >>> > > > and
> > > > >> > >>> > > > > fail to provide fault tolerance in the event of rack
> > > > outage.
> > > > >> > This
> > > > >> > >>> > kind
> > > > >> > >>> > > of
> > > > >> > >>> > > > > problem will be difficult to surface. And the cost
> of
> > > this
> > > > >> > >>> problem is
> > > > >> > >>> > > > high:
> > > > >> > >>> > > > > you have to do partition reassignment if you are
> lucky
> > > to
> > > > >> spot
> > > > >> > >>> the
> > > > >> > >>> > > > problem
> > > > >> > >>> > > > > early on or face the consequence of data loss during
> > > real
> > > > >> rack
> > > > >> > >>> > outage.
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > I do see the concern of fail-fast as it might also
> > cause
> > > > >> data
> > > > >> > >>> loss if
> > > > >> > >>> > > > > producer is not able produce the message due to
> topic
> > > > >> creation
> > > > >> > >>> > failure.
> > > > >> > >>> > > > Is
> > > > >> > >>> > > > > it feasible to treat dynamic topic creation and
> > command
> > > > >> tools
> > > > >> > >>> > > > differently?
> > > > >> > >>> > > > > We allow dynamic topic creation with incomplete
> > > > broker-rack
> > > > >> > >>> mapping
> > > > >> > >>> > and
> > > > >> > >>> > > > > fail fast in command line. Another option is to let
> > user
> > > > >> > >>> determine
> > > > >> > >>> > the
> > > > >> > >>> > > > > behavior for command line. For example, by default
> > fail
> > > > >> fast in
> > > > >> > >>> > command
> > > > >> > >>> > > > > line but allow incomplete broker-rack mapping if
> > another
> > > > >> switch
> > > > >> > >>> is
> > > > >> > >>> > > > > provided.
> > > > >> > >>> > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> > > > >> > >>> > > > > aaurad...@linkedin.com.invalid> wrote:
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > > Hey Allen,
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > 1. If we choose fail fast topic creation, we will
> > have
> > > > >> topic
> > > > >> > >>> > creation
> > > > >> > >>> > > > > > failures while upgrading the cluster. I really
> doubt
> > > we
> > > > >> want
> > > > >> > >>> this
> > > > >> > >>> > > > > behavior.
> > > > >> > >>> > > > > > Ideally, this should be invisible to clients of a
> > > > cluster.
> > > > >> > >>> > Currently,
> > > > >> > >>> > > > > each
> > > > >> > >>> > > > > > broker is effectively its own rack. So we probably
> > can
> > > > use
> > > > >> > the
> > > > >> > >>> rack
> > > > >> > >>> > > > > > information whenever possible but not make it a
> hard
> > > > >> > >>> requirement.
> > > > >> > >>> > To
> > > > >> > >>> > > > > extend
> > > > >> > >>> > > > > > Gwen's example, one badly configured broker should
> > not
> > > > >> > degrade
> > > > >> > >>> > topic
> > > > >> > >>> > > > > > creation for the entire cluster.
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the
> > > > upgrade
> > > > >> > >>> piece to
> > > > >> > >>> > > > > confirm
> > > > >> > >>> > > > > > that old clients will not see errors? I believe
> > > > >> > >>> > > > > ZookeeperConsumerConnector
> > > > >> > >>> > > > > > reads the Broker objects from ZK. I wanted to
> > confirm
> > > > that
> > > > >> > this
> > > > >> > >>> > will
> > > > >> > >>> > > > not
> > > > >> > >>> > > > > > cause any problems.
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > 3. Could you elaborate your proposed changes to
> the
> > > > >> > >>> > > > UpdateMetadataRequest
> > > > >> > >>> > > > > > in the "Public Interfaces" section? Personally, I
> > find
> > > > >> this
> > > > >> > >>> format
> > > > >> > >>> > > easy
> > > > >> > >>> > > > > to
> > > > >> > >>> > > > > > read in terms of wire protocol changes:
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > Aditya
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> > > > >> > >>> allenxw...@gmail.com>
> > > > >> > >>> > > > > wrote:
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > > KIP is updated include rack as an optional
> > property
> > > > for
> > > > >> > >>> broker.
> > > > >> > >>> > > > Please
> > > > >> > >>> > > > > > take
> > > > >> > >>> > > > > > > a look and let me know if more details are
> needed.
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > > For the case where some brokers have rack and
> some
> > > do
> > > > >> not,
> > > > >> > >>> the
> > > > >> > >>> > > > current
> > > > >> > >>> > > > > > KIP
> > > > >> > >>> > > > > > > uses the fail-fast behavior. If there are
> > concerns,
> > > we
> > > > >> can
> > > > >> > >>> > further
> > > > >> > >>> > > > > > discuss
> > > > >> > >>> > > > > > > this in the email thread or next hangout.
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> > > > >> > >>> > allenxw...@gmail.com
> > > > >> > >>> > > >
> > > > >> > >>> > > > > > wrote:
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > > > That's a good question. I can think of three
> > > actions
> > > > >> if
> > > > >> > the
> > > > >> > >>> > rack
> > > > >> > >>> > > > > > > > information is incomplete:
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > > 1. Treat the node without rack as if it is on
> > its
> > > > >> unique
> > > > >> > >>> rack
> > > > >> > >>> > > > > > > > 2. Disregard all rack information and fallback
> > to
> > > > >> current
> > > > >> > >>> > > algorithm
> > > > >> > >>> > > > > > > > 3. Fail-fast
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > > Now I think about it, one and three make more
> > > sense.
> > > > >> The
> > > > >> > >>> reason
> > > > >> > >>> > > for
> > > > >> > >>> > > > > > > > fail-fast is that user mistake for not
> providing
> > > the
> > > > >> rack
> > > > >> > >>> may
> > > > >> > >>> > > never
> > > > >> > >>> > > > > be
> > > > >> > >>> > > > > > > > found if we tolerate that and the assignment
> may
> > > not
> > > > >> be
> > > > >> > >>> rack
> > > > >> > >>> > > aware
> > > > >> > >>> > > > as
> > > > >> > >>> > > > > > the
> > > > >> > >>> > > > > > > > user has expected and this creates debug
> > problems
> > > > when
> > > > >> > >>> things
> > > > >> > >>> > > fail.
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > > What do you think? If not fail-fast, is there
> > > anyway
> > > > >> we
> > > > >> > can
> > > > >> > >>> > make
> > > > >> > >>> > > > the
> > > > >> > >>> > > > > > user
> > > > >> > >>> > > > > > > > error standing out?
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen
> Shapira <
> > > > >> > >>> > > g...@confluent.io>
> > > > >> > >>> > > > > > > wrote:
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers
> have
> > > > rack
> > > > >> > >>> > assignment
> > > > >> > >>> > > > and
> > > > >> > >>> > > > > > some
> > > > >> > >>> > > > > > > >> don't, do we act like none of them have it?
> or
> > > like
> > > > >> > those
> > > > >> > >>> > > without
> > > > >> > >>> > > > > > > >> assignment are in their own rack?
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> The first scenario is good when first setting
> > up
> > > > >> > >>> > rack-awareness,
> > > > >> > >>> > > > but
> > > > >> > >>> > > > > > the
> > > > >> > >>> > > > > > > >> second makes more sense for on-going
> > maintenance
> > > (I
> > > > >> can
> > > > >> > >>> > totally
> > > > >> > >>> > > > see
> > > > >> > >>> > > > > > > >> someone
> > > > >> > >>> > > > > > > >> adding a node and forgetting to set the rack
> > > > >> property,
> > > > >> > we
> > > > >> > >>> > don't
> > > > >> > >>> > > > want
> > > > >> > >>> > > > > > > this
> > > > >> > >>> > > > > > > >> to change behavior for anything except the
> new
> > > > node).
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> What do you think?
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> Gwen
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang
> <
> > > > >> > >>> > > > allenxw...@gmail.com>
> > > > >> > >>> > > > > > > >> wrote:
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> > For scenario 1:
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> property
> > > > file
> > > > >> or
> > > > >> > >>> > > > dynamically
> > > > >> > >>> > > > > > set
> > > > >> > >>> > > > > > > >> it in
> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server.
> > You
> > > > >> would
> > > > >> > do
> > > > >> > >>> > that
> > > > >> > >>> > > > for
> > > > >> > >>> > > > > > all
> > > > >> > >>> > > > > > > >> > brokers and restart the brokers one by one.
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > In this scenario, the complete broker to
> rack
> > > > >> mapping
> > > > >> > >>> may
> > > > >> > >>> > not
> > > > >> > >>> > > be
> > > > >> > >>> > > > > > > >> available
> > > > >> > >>> > > > > > > >> > until every broker is restarted. During
> that
> > > time
> > > > >> we
> > > > >> > >>> fall
> > > > >> > >>> > back
> > > > >> > >>> > > > to
> > > > >> > >>> > > > > > > >> default
> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > For scenario 2:
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> property
> > > > file
> > > > >> or
> > > > >> > >>> > > > dynamically
> > > > >> > >>> > > > > > set
> > > > >> > >>> > > > > > > >> it in
> > > > >> > >>> > > > > > > >> > the wrapper code and start the broker.
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen
> > Shapira <
> > > > >> > >>> > > > g...@confluent.io>
> > > > >> > >>> > > > > > > >> wrote:
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > > Can you clarify the workflow for the
> > > following
> > > > >> > >>> scenarios:
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to
> > add
> > > > >> rack
> > > > >> > >>> > > information
> > > > >> > >>> > > > > for
> > > > >> > >>> > > > > > > >> each
> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to
> > > > specify
> > > > >> > which
> > > > >> > >>> > rack
> > > > >> > >>> > > it
> > > > >> > >>> > > > > > > >> belongs on
> > > > >> > >>> > > > > > > >> > > while adding it.
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > Thanks!
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen
> > Wang <
> > > > >> > >>> > > > > allenxw...@gmail.com
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > > >> > wrote:
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout
> > today.
> > > > The
> > > > >> > >>> > > > recommendation
> > > > >> > >>> > > > > is
> > > > >> > >>> > > > > > > to
> > > > >> > >>> > > > > > > >> > make
> > > > >> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper.
> > For
> > > > >> users
> > > > >> > >>> with
> > > > >> > >>> > > > > existing
> > > > >> > >>> > > > > > > rack
> > > > >> > >>> > > > > > > >> > > > information stored somewhere, they
> would
> > > need
> > > > >> to
> > > > >> > >>> > retrieve
> > > > >> > >>> > > > the
> > > > >> > >>> > > > > > > >> > information
> > > > >> > >>> > > > > > > >> > > > at broker start up and dynamically set
> > the
> > > > rack
> > > > >> > >>> > property,
> > > > >> > >>> > > > > which
> > > > >> > >>> > > > > > > can
> > > > >> > >>> > > > > > > >> be
> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap
> > > broker.
> > > > >> > There
> > > > >> > >>> will
> > > > >> > >>> > > be
> > > > >> > >>> > > > no
> > > > >> > >>> > > > > > > >> > interface
> > > > >> > >>> > > > > > > >> > > or
> > > > >> > >>> > > > > > > >> > > > pluggable implementation to retrieve
> the
> > > rack
> > > > >> > >>> > information.
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > The assumption is that you always need
> to
> > > > >> restart
> > > > >> > >>> the
> > > > >> > >>> > > broker
> > > > >> > >>> > > > > to
> > > > >> > >>> > > > > > > >> make a
> > > > >> > >>> > > > > > > >> > > > change to the rack.
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker
> property,
> > it
> > > > >> will
> > > > >> > be
> > > > >> > >>> > > possible
> > > > >> > >>> > > > > to
> > > > >> > >>> > > > > > > make
> > > > >> > >>> > > > > > > >> > rack
> > > > >> > >>> > > > > > > >> > > > part of the meta data to help the
> > consumer
> > > > >> choose
> > > > >> > >>> which
> > > > >> > >>> > in
> > > > >> > >>> > > > > sync
> > > > >> > >>> > > > > > > >> replica
> > > > >> > >>> > > > > > > >> > > to
> > > > >> > >>> > > > > > > >> > > > consume from as part of the future
> > consumer
> > > > >> > >>> enhancement.
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > Thanks,
> > > > >> > >>> > > > > > > >> > > > Allen
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen
> > Wang
> > > <
> > > > >> > >>> > > > > > allenxw...@gmail.com>
> > > > >> > >>> > > > > > > >> > wrote:
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but
> > this
> > > > KIP
> > > > >> > was
> > > > >> > >>> not
> > > > >> > >>> > > > > > discussed
> > > > >> > >>> > > > > > > >> due
> > > > >> > >>> > > > > > > >> > to
> > > > >> > >>> > > > > > > >> > > > > time constraint.
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > > However, after hearing discussion of
> > > > KIP-35,
> > > > >> I
> > > > >> > >>> have
> > > > >> > >>> > the
> > > > >> > >>> > > > > > feeling
> > > > >> > >>> > > > > > > >> that
> > > > >> > >>> > > > > > > >> > > > > incompatibility (caused by new broker
> > > > >> property)
> > > > >> > >>> > between
> > > > >> > >>> > > > > > brokers
> > > > >> > >>> > > > > > > >> with
> > > > >> > >>> > > > > > > >> > > > > different versions  will be solved
> > there.
> > > > In
> > > > >> > >>> addition,
> > > > >> > >>> > > > > having
> > > > >> > >>> > > > > > > >> stack
> > > > >> > >>> > > > > > > >> > in
> > > > >> > >>> > > > > > > >> > > > > broker property as meta data may also
> > > help
> > > > >> > >>> consumers
> > > > >> > >>> > in
> > > > >> > >>> > > > the
> > > > >> > >>> > > > > > > >> future.
> > > > >> > >>> > > > > > > >> > So
> > > > >> > >>> > > > > > > >> > > I
> > > > >> > >>> > > > > > > >> > > > am
> > > > >> > >>> > > > > > > >> > > > > open to adding stack property to
> > broker.
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the
> > next
> > > > KIP
> > > > >> > >>> hangout.
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM,
> Allen
> > > > Wang <
> > > > >> > >>> > > > > > > allenxw...@gmail.com
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > > > wrote:
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > >> Can you send me the information on
> the
> > > > next
> > > > >> KIP
> > > > >> > >>> > > hangout?
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is
> > not
> > > > >> > cached.
> > > > >> > >>> In
> > > > >> > >>> > > > > > KafkaApis,
> > > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called
> > each
> > > > >> time
> > > > >> > the
> > > > >> > >>> > > mapping
> > > > >> > >>> > > > > is
> > > > >> > >>> > > > > > > >> needed
> > > > >> > >>> > > > > > > >> > > for
> > > > >> > >>> > > > > > > >> > > > >> auto topic creation. This will
> ensure
> > > > latest
> > > > >> > >>> mapping
> > > > >> > >>> > is
> > > > >> > >>> > > > > used
> > > > >> > >>> > > > > > at
> > > > >> > >>> > > > > > > >> any
> > > > >> > >>> > > > > > > >> > > > time.
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >> The ability to get the complete
> > mapping
> > > > >> makes
> > > > >> > it
> > > > >> > >>> > simple
> > > > >> > >>> > > > to
> > > > >> > >>> > > > > > > reuse
> > > > >> > >>> > > > > > > >> the
> > > > >> > >>> > > > > > > >> > > > same
> > > > >> > >>> > > > > > > >> > > > >> interface in command line tools.
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM,
> > Aditya
> > > > >> > >>> Auradkar <
> > > > >> > >>> > > > > > > >> > > > >> aaurad...@linkedin.com.invalid>
> > wrote:
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the
> > next
> > > > KIP
> > > > >> > >>> hangout?
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack
> > locator
> > > > can
> > > > >> be
> > > > >> > >>> useful
> > > > >> > >>> > > > but I
> > > > >> > >>> > > > > > do
> > > > >> > >>> > > > > > > >> see a
> > > > >> > >>> > > > > > > >> > > few
> > > > >> > >>> > > > > > > >> > > > >>> concerns:
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in
> > the
> > > > >> > >>> document),
> > > > >> > >>> > > > implies
> > > > >> > >>> > > > > > that
> > > > >> > >>> > > > > > > >> it
> > > > >> > >>> > > > > > > >> > can
> > > > >> > >>> > > > > > > >> > > > >>> discover rack information for any
> > node
> > > in
> > > > >> the
> > > > >> > >>> > cluster.
> > > > >> > >>> > > > How
> > > > >> > >>> > > > > > > does
> > > > >> > >>> > > > > > > >> it
> > > > >> > >>> > > > > > > >> > > deal
> > > > >> > >>> > > > > > > >> > > > >>> with rack location changes? For
> > > example,
> > > > >> if I
> > > > >> > >>> moved
> > > > >> > >>> > > > broker
> > > > >> > >>> > > > > > id
> > > > >> > >>> > > > > > > >> (1)
> > > > >> > >>> > > > > > > >> > > from
> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that
> > > broker
> > > > >> with
> > > > >> > a
> > > > >> > >>> > newer
> > > > >> > >>> > > > rack
> > > > >> > >>> > > > > > > >> config.
> > > > >> > >>> > > > > > > >> > If
> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker ->
> rack
> > > > >> > >>> information at
> > > > >> > >>> > > > start
> > > > >> > >>> > > > > up
> > > > >> > >>> > > > > > > >> time,
> > > > >> > >>> > > > > > > >> > > any
> > > > >> > >>> > > > > > > >> > > > >>> change to a broker will require
> > > bouncing
> > > > >> the
> > > > >> > >>> entire
> > > > >> > >>> > > > > cluster
> > > > >> > >>> > > > > > > >> since
> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to
> > any
> > > > >> node
> > > > >> > in
> > > > >> > >>> the
> > > > >> > >>> > > > > cluster.
> > > > >> > >>> > > > > > > >> > > > >>> For this reason it may be simpler
> to
> > > have
> > > > >> each
> > > > >> > >>> node
> > > > >> > >>> > be
> > > > >> > >>> > > > > aware
> > > > >> > >>> > > > > > > of
> > > > >> > >>> > > > > > > >> its
> > > > >> > >>> > > > > > > >> > > own
> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during
> > start
> > > up
> > > > >> > time.
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on
> > an
> > > > >> > external
> > > > >> > >>> > > service
> > > > >> > >>> > > > > > being
> > > > >> > >>> > > > > > > >> > > available
> > > > >> > >>> > > > > > > >> > > > >>> to
> > > > >> > >>> > > > > > > >> > > > >>> serve rack information.
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a
> > > > couple
> > > > >> of
> > > > >> > >>> other
> > > > >> > >>> > > > > systems
> > > > >> > >>> > > > > > > deal
> > > > >> > >>> > > > > > > >> > with
> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting
> modes
> > > are:
> > > > >> > >>> > > > > > > >> > > > >>> (Property File configuration)
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node ->
> zone
> > > > >> > assignment
> > > > >> > >>> > based
> > > > >> > >>> > > on
> > > > >> > >>> > > > > > > >> > > configuration.
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> Aditya
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM,
> > Allen
> > > > >> Wang <
> > > > >> > >>> > > > > > > >> allenxw...@gmail.com
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> > I would like to see if we can do
> > > both:
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to
> > > > >> facilitate
> > > > >> > >>> > migration
> > > > >> > >>> > > > > with
> > > > >> > >>> > > > > > > >> > existing
> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional property
> > for
> > > > >> broker.
> > > > >> > >>> If
> > > > >> > >>> > rack
> > > > >> > >>> > > > is
> > > > >> > >>> > > > > > > >> available
> > > > >> > >>> > > > > > > >> > > > from
> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of
> > truth.
> > > > For
> > > > >> > users
> > > > >> > >>> > with
> > > > >> > >>> > > > > > existing
> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can
> > use
> > > > the
> > > > >> > >>> pluggable
> > > > >> > >>> > > way
> > > > >> > >>> > > > > or
> > > > >> > >>> > > > > > > they
> > > > >> > >>> > > > > > > >> > can
> > > > >> > >>> > > > > > > >> > > > >>> transfer
> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack
> > > > property.
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what
> > > happens
> > > > >> at
> > > > >> > >>> rolling
> > > > >> > >>> > > > > upgrade
> > > > >> > >>> > > > > > > >> when
> > > > >> > >>> > > > > > > >> > we
> > > > >> > >>> > > > > > > >> > > > have
> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For
> > > brokers
> > > > >> with
> > > > >> > >>> older
> > > > >> > >>> > > > > version
> > > > >> > >>> > > > > > of
> > > > >> > >>> > > > > > > >> > Kafka,
> > > > >> > >>> > > > > > > >> > > > >>> will it
> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is
> > > there
> > > > >> any
> > > > >> > >>> > > > workaround?
> > > > >> > >>> > > > > I
> > > > >> > >>> > > > > > > also
> > > > >> > >>> > > > > > > >> > > think
> > > > >> > >>> > > > > > > >> > > > it
> > > > >> > >>> > > > > > > >> > > > >>> > would be better not to have rack
> in
> > > the
> > > > >> > >>> controller
> > > > >> > >>> > > > wire
> > > > >> > >>> > > > > > > >> protocol
> > > > >> > >>> > > > > > > >> > > but
> > > > >> > >>> > > > > > > >> > > > >>> not
> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
> > > > >> > >>> > > > > > > >> > > > >>> > Allen
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM,
> > Todd
> > > > >> > Palino <
> > > > >> > >>> > > > > > > >> tpal...@gmail.com>
> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a
> > > > pluggable
> > > > >> > >>> locator.
> > > > >> > >>> > > For
> > > > >> > >>> > > > > > > >> example, we
> > > > >> > >>> > > > > > > >> > > > >>> already
> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface for
> discovering
> > > > >> > >>> information
> > > > >> > >>> > > about
> > > > >> > >>> > > > > the
> > > > >> > >>> > > > > > > >> > physical
> > > > >> > >>> > > > > > > >> > > > >>> location
> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the
> > idea
> > > > of
> > > > >> > >>> having to
> > > > >> > >>> > > > > > maintain
> > > > >> > >>> > > > > > > >> data
> > > > >> > >>> > > > > > > >> > in
> > > > >> > >>> > > > > > > >> > > > >>> > multiple
> > > > >> > >>> > > > > > > >> > > > >>> > > places.
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48
> PM,
> > > > Aditya
> > > > >> > >>> > Auradkar <
> > > > >> > >>> > > > > > > >> > > > >>> > > aaurad...@linkedin.com.invalid
> >
> > > > wrote:
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP
> > > Allen.
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that
> having a
> > > > >> > >>> RackLocator
> > > > >> > >>> > > class
> > > > >> > >>> > > > > that
> > > > >> > >>> > > > > > > is
> > > > >> > >>> > > > > > > >> > > > pluggable
> > > > >> > >>> > > > > > > >> > > > >>> > seems
> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP
> > refers
> > > > to
> > > > >> > >>> > potentially
> > > > >> > >>> > > > > > non-ZK
> > > > >> > >>> > > > > > > >> > storage
> > > > >> > >>> > > > > > > >> > > > >>> for the
> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think
> > is
> > > > >> > >>> necessary.
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this
> > info
> > > in
> > > > >> zk
> > > > >> > >>> under
> > > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other broker
> > > properties
> > > > >> and
> > > > >> > >>> add a
> > > > >> > >>> > > > config
> > > > >> > >>> > > > > in
> > > > >> > >>> > > > > > > >> > > > KafkaConfig
> > > > >> > >>> > > > > > > >> > > > >>> > called
> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > >
> > > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > > >> > >>> > > > > > > >> > > "rack":
> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30
> > PM,
> > > > Gwen
> > > > >> > >>> Shapira
> > > > >> > >>> > <
> > > > >> > >>> > > > > > > >> > > g...@confluent.io
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting
> > out a
> > > > KIP
> > > > >> > for
> > > > >> > >>> > this.
> > > > >> > >>> > > > This
> > > > >> > >>> > > > > > is
> > > > >> > >>> > > > > > > >> super
> > > > >> > >>> > > > > > > >> > > > >>> important
> > > > >> > >>> > > > > > > >> > > > >>> > > for
> > > > >> > >>> > > > > > > >> > > > >>> > > > > production deployments of
> > > Kafka.
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as
> > many
> > > > >> racks
> > > > >> > as
> > > > >> > >>> > > > > possible"?
> > > > >> > >>> > > > > > > I'd
> > > > >> > >>> > > > > > > >> > want
> > > > >> > >>> > > > > > > >> > > to
> > > > >> > >>> > > > > > > >> > > > >>> > balance
> > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks)
> > and
> > > > >> > network
> > > > >> > >>> > > > > utilization
> > > > >> > >>> > > > > > > >> > (traffic
> > > > >> > >>> > > > > > > >> > > > >>> within a
> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR
> > > > >> switch).
> > > > >> > One
> > > > >> > >>> > > replica
> > > > >> > >>> > > > > on
> > > > >> > >>> > > > > > a
> > > > >> > >>> > > > > > > >> > > different
> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > >> > >>> > > > > > > >> > > > >>> > > and
> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if
> > > > possible)
> > > > >> > >>> sounds
> > > > >> > >>> > > > better
> > > > >> > >>> > > > > to
> > > > >> > >>> > > > > > > me.
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems
> > > > overly
> > > > >> > >>> complex
> > > > >> > >>> > > > > compared
> > > > >> > >>> > > > > > to
> > > > >> > >>> > > > > > > >> > > adding a
> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to the broker
> > > properties
> > > > >> > file.
> > > > >> > >>> Why
> > > > >> > >>> > do
> > > > >> > >>> > > > we
> > > > >> > >>> > > > > > want
> > > > >> > >>> > > > > > > >> > that?
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at
> 12:15
> > > PM,
> > > > >> > Allen
> > > > >> > >>> > Wang <
> > > > >> > >>> > > > > > > >> > > > >>> allenxw...@gmail.com>
> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for
> > > rack
> > > > >> aware
> > > > >> > >>> > replica
> > > > >> > >>> > > > > > > >> assignment.
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize
> the
> > > > >> isolation
> > > > >> > >>> > > provided
> > > > >> > >>> > > > by
> > > > >> > >>> > > > > > the
> > > > >> > >>> > > > > > > >> > racks
> > > > >> > >>> > > > > > > >> > > in
> > > > >> > >>> > > > > > > >> > > > >>> data
> > > > >> > >>> > > > > > > >> > > > >>> > > > center
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas
> to
> > > > racks
> > > > >> to
> > > > >> > >>> > provide
> > > > >> > >>> > > > > fault
> > > > >> > >>> > > > > > > >> > > tolerance.
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>> --
> > > > >> > >>> Thanks,
> > > > >> > >>> Neha
> > > > >> > >>>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to