Jun/Allen -

Did we ever actually agree on whether we should evolve the TMR to include
rack info or not?
I don't feel strongly about it but I if it's the right thing to do we
should probably do it in this KIP (can be a separate patch).. it isn't a
large change.

Aditya

On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <allenxw...@gmail.com> wrote:

> Added the rolling upgrade instruction in the KIP, similar to those in 0.9.0
> release notes.
>
> On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <allenxw...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > The reason that TopicMetadataResponse is not included in the KIP is that
> > it currently is not version aware . So we need to introduce version to it
> > in order to make sure backward compatibility. It seems to me a big
> change.
> > Do we want to couple it with this KIP? Do we need to further discuss what
> > information to include in the new version besides rack? For example,
> should
> > we include broker security protocol in TopicMetadataResponse?
> >
> > The other option is to make it a separate KIP to make
> > TopicMetadataResponse version aware and decide what to include, and make
> > this KIP focus on the rack aware algorithm, admin tools  and related
> > changes to inter-broker protocol .
> >
> > Thanks,
> > Allen
> >
> >
> >
> >
> > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <j...@confluent.io> wrote:
> >
> >> Allen,
> >>
> >> Thanks for the proposal. A few comments.
> >>
> >> 1. Since this KIP changes the inter broker communication protocol
> >> (UpdateMetadataRequest), we will need to document the upgrade path
> >> (similar
> >> to what's described in
> >> http://kafka.apache.org/090/documentation.html#upgrade).
> >>
> >> 2. It might be useful to include the rack info of the broker in
> >> TopicMetadataResponse. This can be useful for administrative tasks, as
> >> well
> >> as read affinity in the future.
> >>
> >> Jun
> >>
> >>
> >>
> >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <allenxw...@gmail.com>
> wrote:
> >>
> >> > If there are no more comments I would like to call for a vote.
> >> >
> >> >
> >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <allenxw...@gmail.com>
> >> wrote:
> >> >
> >> > > KIP is updated with more details and how to handle the situation
> where
> >> > > rack information is incomplete.
> >> > >
> >> > > In the situation where rack information is incomplete, but we want
> to
> >> > > continue with the assignment, I have suggested to ignore all rack
> >> > > information and fallback to original algorithm. The reason is
> >> explained
> >> > > below:
> >> > >
> >> > > The other options are to assume that the broker without the rack
> >> belong
> >> > to
> >> > > its own unique rack, or they belong to one "default" rack. Either
> way
> >> we
> >> > > choose, it is highly likely to result in uneven number of brokers in
> >> > racks,
> >> > > and it is quite possible that the "made up" racks will have much
> fewer
> >> > > number of brokers. As I explained in the KIP, uneven number of
> >> brokers in
> >> > > racks will lead to uneven distribution of replicas among brokers
> (even
> >> > > though the leader distribution is still even). The brokers in the
> rack
> >> > that
> >> > > has fewer number of brokers will get more replicas per broker than
> >> > brokers
> >> > > in other racks.
> >> > >
> >> > > Given this fact and the replica assignment produced will be
> incorrect
> >> > > anyway from rack aware point of view, ignoring all rack information
> >> and
> >> > > fallback to the original algorithm is not a bad choice since it will
> >> at
> >> > > least have a better guarantee of replica distribution.
> >> > >
> >> > > Also for command line tools it gives user a choice if for any reason
> >> they
> >> > > want to ignore rack information and fallback to the original
> >> algorithm.
> >> > >
> >> > >
> >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <allenxw...@gmail.com>
> >> > wrote:
> >> > >
> >> > >> I am busy with some time pressing issues for the last few days. I
> >> will
> >> > >> think about how the incomplete rack information will affect the
> >> balance
> >> > and
> >> > >> update the KIP by early next week.
> >> > >>
> >> > >> Thanks,
> >> > >> Allen
> >> > >>
> >> > >>
> >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <n...@confluent.io>
> >> > wrote:
> >> > >>
> >> > >>> Few suggestions on improving the KIP
> >> > >>>
> >> > >>> *If some brokers have rack, and some do not, the algorithm will
> >> thrown
> >> > an
> >> > >>> > exception. This is to prevent incorrect assignment caused by
> user
> >> > >>> error.*
> >> > >>>
> >> > >>>
> >> > >>> In the KIP, can you clearly state the user-facing behavior when
> some
> >> > >>> brokers have rack information and some don't. Which actions and
> >> > requests
> >> > >>> will error out and how?
> >> > >>>
> >> > >>> *Even distribution of partition leadership among brokers*
> >> > >>>
> >> > >>>
> >> > >>> There is some information about arranging the sorted broker list
> >> > >>> interlaced
> >> > >>> with rack ids. Can you describe the changes to the current
> algorithm
> >> > in a
> >> > >>> little more detail? How does this interlacing work if only a
> subset
> >> of
> >> > >>> brokers have the rack id configured? Does this still work if
> uneven
> >> #
> >> > of
> >> > >>> brokers are assigned to each rack? It might work, I'm looking for
> >> more
> >> > >>> details on the changes, since it will affect the behavior seen by
> >> the
> >> > >>> user
> >> > >>> - imbalance on either the leaders or data or both.
> >> > >>>
> >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> >> > aaurad...@linkedin.com>
> >> > >>> wrote:
> >> > >>>
> >> > >>> > I think this sounds reasonable. Anyone else have comments?
> >> > >>> >
> >> > >>> > Aditya
> >> > >>> >
> >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> allenxw...@gmail.com
> >> >
> >> > >>> wrote:
> >> > >>> >
> >> > >>> > > During the discussion in the hangout, it was mentioned that it
> >> > would
> >> > >>> be
> >> > >>> > > desirable that consumers know the rack information of the
> >> brokers
> >> > so
> >> > >>> that
> >> > >>> > > they can consume from the broker in the same rack to reduce
> >> > latency.
> >> > >>> As I
> >> > >>> > > understand this will only be beneficial if consumer can
> consume
> >> > from
> >> > >>> any
> >> > >>> > > broker in ISR, which is not possible now.
> >> > >>> > >
> >> > >>> > > I suggest we skip the change to TMR. Once the change is made
> to
> >> > >>> consumer
> >> > >>> > to
> >> > >>> > > be able to consume from any broker in ISR, the rack
> information
> >> can
> >> > >>> be
> >> > >>> > > added to TMR.
> >> > >>> > >
> >> > >>> > > Another thing I want to confirm is  command line behavior. I
> >> think
> >> > >>> the
> >> > >>> > > desirable default behavior is to fail fast on command line for
> >> > >>> incomplete
> >> > >>> > > rack mapping. The error message can include further
> instruction
> >> > that
> >> > >>> > tells
> >> > >>> > > the user to add an extra argument (like
> >> "--allow-partial-rackinfo")
> >> > >>> to
> >> > >>> > > suppress the error and do an imperfect rack aware assignment.
> If
> >> > the
> >> > >>> > > default behavior is to allow incomplete mapping, the error can
> >> > still
> >> > >>> be
> >> > >>> > > easily missed.
> >> > >>> > >
> >> > >>> > > The affected command line tools are TopicCommand and
> >> > >>> > > ReassignPartitionsCommand.
> >> > >>> > >
> >> > >>> > > Thanks,
> >> > >>> > > Allen
> >> > >>> > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> >> > >>> > aaurad...@linkedin.com>
> >> > >>> > > wrote:
> >> > >>> > >
> >> > >>> > > > Hi Allen,
> >> > >>> > > >
> >> > >>> > > > For TopicMetadataResponse to understand version, you can
> bump
> >> up
> >> > >>> the
> >> > >>> > > > request version itself. Based on the version of the request,
> >> the
> >> > >>> > response
> >> > >>> > > > can be appropriately serialized. It shouldn't be a huge
> >> change.
> >> > For
> >> > >>> > > > example: We went through something similar for
> ProduceRequest
> >> > >>> recently
> >> > >>> > (
> >> > >>> > > > https://reviews.apache.org/r/33378/)
> >> > >>> > > > I guess the reason protocol information is not included in
> the
> >> > TMR
> >> > >>> is
> >> > >>> > > > because the topic itself is independent of any particular
> >> > protocol
> >> > >>> (SSL
> >> > >>> > > vs
> >> > >>> > > > Plaintext). Having said that, I'm not sure we even need rack
> >> > >>> > information
> >> > >>> > > in
> >> > >>> > > > TMR. What usecase were you thinking of initially?
> >> > >>> > > >
> >> > >>> > > > For 1 - I'd be fine with adding an option to the command
> line
> >> > tools
> >> > >>> > that
> >> > >>> > > > check rack assignment. For e.g. "--strict-assignment" or
> >> > something
> >> > >>> > > similar.
> >> > >>> > > >
> >> > >>> > > > Aditya
> >> > >>> > > >
> >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> >> > allenxw...@gmail.com>
> >> > >>> > > wrote:
> >> > >>> > > >
> >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a look.
> One
> >> > >>> thing I
> >> > >>> > > have
> >> > >>> > > > > changed is removing the proposal to add rack to
> >> > >>> > TopicMetadataResponse.
> >> > >>> > > > The
> >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> >> > >>> TopicMetadataResponse
> >> > >>> > does
> >> > >>> > > > not
> >> > >>> > > > > understand version. I don't see a way to include rack
> >> without
> >> > >>> > breaking
> >> > >>> > > > old
> >> > >>> > > > > version of clients. That's probably why secure protocol is
> >> not
> >> > >>> > included
> >> > >>> > > > in
> >> > >>> > > > > the TopicMetadataResponse either. I think it will be a
> much
> >> > >>> bigger
> >> > >>> > > change
> >> > >>> > > > > to include rack in TopicMetadataResponse.
> >> > >>> > > > >
> >> > >>> > > > > For 1, my concern is that doing rack aware assignment
> >> without
> >> > >>> > complete
> >> > >>> > > > > broker to rack mapping will result in assignment that is
> not
> >> > rack
> >> > >>> > aware
> >> > >>> > > > and
> >> > >>> > > > > fail to provide fault tolerance in the event of rack
> outage.
> >> > This
> >> > >>> > kind
> >> > >>> > > of
> >> > >>> > > > > problem will be difficult to surface. And the cost of this
> >> > >>> problem is
> >> > >>> > > > high:
> >> > >>> > > > > you have to do partition reassignment if you are lucky to
> >> spot
> >> > >>> the
> >> > >>> > > > problem
> >> > >>> > > > > early on or face the consequence of data loss during real
> >> rack
> >> > >>> > outage.
> >> > >>> > > > >
> >> > >>> > > > > I do see the concern of fail-fast as it might also cause
> >> data
> >> > >>> loss if
> >> > >>> > > > > producer is not able produce the message due to topic
> >> creation
> >> > >>> > failure.
> >> > >>> > > > Is
> >> > >>> > > > > it feasible to treat dynamic topic creation and command
> >> tools
> >> > >>> > > > differently?
> >> > >>> > > > > We allow dynamic topic creation with incomplete
> broker-rack
> >> > >>> mapping
> >> > >>> > and
> >> > >>> > > > > fail fast in command line. Another option is to let user
> >> > >>> determine
> >> > >>> > the
> >> > >>> > > > > behavior for command line. For example, by default fail
> >> fast in
> >> > >>> > command
> >> > >>> > > > > line but allow incomplete broker-rack mapping if another
> >> switch
> >> > >>> is
> >> > >>> > > > > provided.
> >> > >>> > > > >
> >> > >>> > > > >
> >> > >>> > > > >
> >> > >>> > > > >
> >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> >> > >>> > > > > aaurad...@linkedin.com.invalid> wrote:
> >> > >>> > > > >
> >> > >>> > > > > > Hey Allen,
> >> > >>> > > > > >
> >> > >>> > > > > > 1. If we choose fail fast topic creation, we will have
> >> topic
> >> > >>> > creation
> >> > >>> > > > > > failures while upgrading the cluster. I really doubt we
> >> want
> >> > >>> this
> >> > >>> > > > > behavior.
> >> > >>> > > > > > Ideally, this should be invisible to clients of a
> cluster.
> >> > >>> > Currently,
> >> > >>> > > > > each
> >> > >>> > > > > > broker is effectively its own rack. So we probably can
> use
> >> > the
> >> > >>> rack
> >> > >>> > > > > > information whenever possible but not make it a hard
> >> > >>> requirement.
> >> > >>> > To
> >> > >>> > > > > extend
> >> > >>> > > > > > Gwen's example, one badly configured broker should not
> >> > degrade
> >> > >>> > topic
> >> > >>> > > > > > creation for the entire cluster.
> >> > >>> > > > > >
> >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the
> upgrade
> >> > >>> piece to
> >> > >>> > > > > confirm
> >> > >>> > > > > > that old clients will not see errors? I believe
> >> > >>> > > > > ZookeeperConsumerConnector
> >> > >>> > > > > > reads the Broker objects from ZK. I wanted to confirm
> that
> >> > this
> >> > >>> > will
> >> > >>> > > > not
> >> > >>> > > > > > cause any problems.
> >> > >>> > > > > >
> >> > >>> > > > > > 3. Could you elaborate your proposed changes to the
> >> > >>> > > > UpdateMetadataRequest
> >> > >>> > > > > > in the "Public Interfaces" section? Personally, I find
> >> this
> >> > >>> format
> >> > >>> > > easy
> >> > >>> > > > > to
> >> > >>> > > > > > read in terms of wire protocol changes:
> >> > >>> > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> >> > >>> > > > > >
> >> > >>> > > > > > Aditya
> >> > >>> > > > > >
> >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> >> > >>> allenxw...@gmail.com>
> >> > >>> > > > > wrote:
> >> > >>> > > > > >
> >> > >>> > > > > > > KIP is updated include rack as an optional property
> for
> >> > >>> broker.
> >> > >>> > > > Please
> >> > >>> > > > > > take
> >> > >>> > > > > > > a look and let me know if more details are needed.
> >> > >>> > > > > > >
> >> > >>> > > > > > > For the case where some brokers have rack and some do
> >> not,
> >> > >>> the
> >> > >>> > > > current
> >> > >>> > > > > > KIP
> >> > >>> > > > > > > uses the fail-fast behavior. If there are concerns, we
> >> can
> >> > >>> > further
> >> > >>> > > > > > discuss
> >> > >>> > > > > > > this in the email thread or next hangout.
> >> > >>> > > > > > >
> >> > >>> > > > > > >
> >> > >>> > > > > > >
> >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> >> > >>> > allenxw...@gmail.com
> >> > >>> > > >
> >> > >>> > > > > > wrote:
> >> > >>> > > > > > >
> >> > >>> > > > > > > > That's a good question. I can think of three actions
> >> if
> >> > the
> >> > >>> > rack
> >> > >>> > > > > > > > information is incomplete:
> >> > >>> > > > > > > >
> >> > >>> > > > > > > > 1. Treat the node without rack as if it is on its
> >> unique
> >> > >>> rack
> >> > >>> > > > > > > > 2. Disregard all rack information and fallback to
> >> current
> >> > >>> > > algorithm
> >> > >>> > > > > > > > 3. Fail-fast
> >> > >>> > > > > > > >
> >> > >>> > > > > > > > Now I think about it, one and three make more sense.
> >> The
> >> > >>> reason
> >> > >>> > > for
> >> > >>> > > > > > > > fail-fast is that user mistake for not providing the
> >> rack
> >> > >>> may
> >> > >>> > > never
> >> > >>> > > > > be
> >> > >>> > > > > > > > found if we tolerate that and the assignment may not
> >> be
> >> > >>> rack
> >> > >>> > > aware
> >> > >>> > > > as
> >> > >>> > > > > > the
> >> > >>> > > > > > > > user has expected and this creates debug problems
> when
> >> > >>> things
> >> > >>> > > fail.
> >> > >>> > > > > > > >
> >> > >>> > > > > > > > What do you think? If not fail-fast, is there anyway
> >> we
> >> > can
> >> > >>> > make
> >> > >>> > > > the
> >> > >>> > > > > > user
> >> > >>> > > > > > > > error standing out?
> >> > >>> > > > > > > >
> >> > >>> > > > > > > >
> >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> >> > >>> > > g...@confluent.io>
> >> > >>> > > > > > > wrote:
> >> > >>> > > > > > > >
> >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers have
> rack
> >> > >>> > assignment
> >> > >>> > > > and
> >> > >>> > > > > > some
> >> > >>> > > > > > > >> don't, do we act like none of them have it? or like
> >> > those
> >> > >>> > > without
> >> > >>> > > > > > > >> assignment are in their own rack?
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> The first scenario is good when first setting up
> >> > >>> > rack-awareness,
> >> > >>> > > > but
> >> > >>> > > > > > the
> >> > >>> > > > > > > >> second makes more sense for on-going maintenance (I
> >> can
> >> > >>> > totally
> >> > >>> > > > see
> >> > >>> > > > > > > >> someone
> >> > >>> > > > > > > >> adding a node and forgetting to set the rack
> >> property,
> >> > we
> >> > >>> > don't
> >> > >>> > > > want
> >> > >>> > > > > > > this
> >> > >>> > > > > > > >> to change behavior for anything except the new
> node).
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> What do you think?
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> Gwen
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> >> > >>> > > > allenxw...@gmail.com>
> >> > >>> > > > > > > >> wrote:
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> > For scenario 1:
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > - Add the rack information to broker property
> file
> >> or
> >> > >>> > > > dynamically
> >> > >>> > > > > > set
> >> > >>> > > > > > > >> it in
> >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You
> >> would
> >> > do
> >> > >>> > that
> >> > >>> > > > for
> >> > >>> > > > > > all
> >> > >>> > > > > > > >> > brokers and restart the brokers one by one.
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > In this scenario, the complete broker to rack
> >> mapping
> >> > >>> may
> >> > >>> > not
> >> > >>> > > be
> >> > >>> > > > > > > >> available
> >> > >>> > > > > > > >> > until every broker is restarted. During that time
> >> we
> >> > >>> fall
> >> > >>> > back
> >> > >>> > > > to
> >> > >>> > > > > > > >> default
> >> > >>> > > > > > > >> > replica assignment algorithm.
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > For scenario 2:
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > - Add the rack information to broker property
> file
> >> or
> >> > >>> > > > dynamically
> >> > >>> > > > > > set
> >> > >>> > > > > > > >> it in
> >> > >>> > > > > > > >> > the wrapper code and start the broker.
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> >> > >>> > > > g...@confluent.io>
> >> > >>> > > > > > > >> wrote:
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > > Can you clarify the workflow for the following
> >> > >>> scenarios:
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add
> >> rack
> >> > >>> > > information
> >> > >>> > > > > for
> >> > >>> > > > > > > >> each
> >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to
> specify
> >> > which
> >> > >>> > rack
> >> > >>> > > it
> >> > >>> > > > > > > >> belongs on
> >> > >>> > > > > > > >> > > while adding it.
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > Thanks!
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> >> > >>> > > > > allenxw...@gmail.com
> >> > >>> > > > > > >
> >> > >>> > > > > > > >> > wrote:
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout today.
> The
> >> > >>> > > > recommendation
> >> > >>> > > > > is
> >> > >>> > > > > > > to
> >> > >>> > > > > > > >> > make
> >> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For
> >> users
> >> > >>> with
> >> > >>> > > > > existing
> >> > >>> > > > > > > rack
> >> > >>> > > > > > > >> > > > information stored somewhere, they would need
> >> to
> >> > >>> > retrieve
> >> > >>> > > > the
> >> > >>> > > > > > > >> > information
> >> > >>> > > > > > > >> > > > at broker start up and dynamically set the
> rack
> >> > >>> > property,
> >> > >>> > > > > which
> >> > >>> > > > > > > can
> >> > >>> > > > > > > >> be
> >> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker.
> >> > There
> >> > >>> will
> >> > >>> > > be
> >> > >>> > > > no
> >> > >>> > > > > > > >> > interface
> >> > >>> > > > > > > >> > > or
> >> > >>> > > > > > > >> > > > pluggable implementation to retrieve the rack
> >> > >>> > information.
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > The assumption is that you always need to
> >> restart
> >> > >>> the
> >> > >>> > > broker
> >> > >>> > > > > to
> >> > >>> > > > > > > >> make a
> >> > >>> > > > > > > >> > > > change to the rack.
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > Once the rack becomes a broker property, it
> >> will
> >> > be
> >> > >>> > > possible
> >> > >>> > > > > to
> >> > >>> > > > > > > make
> >> > >>> > > > > > > >> > rack
> >> > >>> > > > > > > >> > > > part of the meta data to help the consumer
> >> choose
> >> > >>> which
> >> > >>> > in
> >> > >>> > > > > sync
> >> > >>> > > > > > > >> replica
> >> > >>> > > > > > > >> > > to
> >> > >>> > > > > > > >> > > > consume from as part of the future consumer
> >> > >>> enhancement.
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > I will update the KIP.
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > Thanks,
> >> > >>> > > > > > > >> > > > Allen
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> >> > >>> > > > > > allenxw...@gmail.com>
> >> > >>> > > > > > > >> > wrote:
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this
> KIP
> >> > was
> >> > >>> not
> >> > >>> > > > > > discussed
> >> > >>> > > > > > > >> due
> >> > >>> > > > > > > >> > to
> >> > >>> > > > > > > >> > > > > time constraint.
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > > However, after hearing discussion of
> KIP-35,
> >> I
> >> > >>> have
> >> > >>> > the
> >> > >>> > > > > > feeling
> >> > >>> > > > > > > >> that
> >> > >>> > > > > > > >> > > > > incompatibility (caused by new broker
> >> property)
> >> > >>> > between
> >> > >>> > > > > > brokers
> >> > >>> > > > > > > >> with
> >> > >>> > > > > > > >> > > > > different versions  will be solved there.
> In
> >> > >>> addition,
> >> > >>> > > > > having
> >> > >>> > > > > > > >> stack
> >> > >>> > > > > > > >> > in
> >> > >>> > > > > > > >> > > > > broker property as meta data may also help
> >> > >>> consumers
> >> > >>> > in
> >> > >>> > > > the
> >> > >>> > > > > > > >> future.
> >> > >>> > > > > > > >> > So
> >> > >>> > > > > > > >> > > I
> >> > >>> > > > > > > >> > > > am
> >> > >>> > > > > > > >> > > > > open to adding stack property to broker.
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the next
> KIP
> >> > >>> hangout.
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen
> Wang <
> >> > >>> > > > > > > allenxw...@gmail.com
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > > > wrote:
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > >> Can you send me the information on the
> next
> >> KIP
> >> > >>> > > hangout?
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not
> >> > cached.
> >> > >>> In
> >> > >>> > > > > > KafkaApis,
> >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each
> >> time
> >> > the
> >> > >>> > > mapping
> >> > >>> > > > > is
> >> > >>> > > > > > > >> needed
> >> > >>> > > > > > > >> > > for
> >> > >>> > > > > > > >> > > > >> auto topic creation. This will ensure
> latest
> >> > >>> mapping
> >> > >>> > is
> >> > >>> > > > > used
> >> > >>> > > > > > at
> >> > >>> > > > > > > >> any
> >> > >>> > > > > > > >> > > > time.
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >> The ability to get the complete mapping
> >> makes
> >> > it
> >> > >>> > simple
> >> > >>> > > > to
> >> > >>> > > > > > > reuse
> >> > >>> > > > > > > >> the
> >> > >>> > > > > > > >> > > > same
> >> > >>> > > > > > > >> > > > >> interface in command line tools.
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
> >> > >>> Auradkar <
> >> > >>> > > > > > > >> > > > >> aaurad...@linkedin.com.invalid> wrote:
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next
> KIP
> >> > >>> hangout?
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator
> can
> >> be
> >> > >>> useful
> >> > >>> > > > but I
> >> > >>> > > > > > do
> >> > >>> > > > > > > >> see a
> >> > >>> > > > > > > >> > > few
> >> > >>> > > > > > > >> > > > >>> concerns:
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in the
> >> > >>> document),
> >> > >>> > > > implies
> >> > >>> > > > > > that
> >> > >>> > > > > > > >> it
> >> > >>> > > > > > > >> > can
> >> > >>> > > > > > > >> > > > >>> discover rack information for any node in
> >> the
> >> > >>> > cluster.
> >> > >>> > > > How
> >> > >>> > > > > > > does
> >> > >>> > > > > > > >> it
> >> > >>> > > > > > > >> > > deal
> >> > >>> > > > > > > >> > > > >>> with rack location changes? For example,
> >> if I
> >> > >>> moved
> >> > >>> > > > broker
> >> > >>> > > > > > id
> >> > >>> > > > > > > >> (1)
> >> > >>> > > > > > > >> > > from
> >> > >>> > > > > > > >> > > > >>> rack
> >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that broker
> >> with
> >> > a
> >> > >>> > newer
> >> > >>> > > > rack
> >> > >>> > > > > > > >> config.
> >> > >>> > > > > > > >> > If
> >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
> >> > >>> information at
> >> > >>> > > > start
> >> > >>> > > > > up
> >> > >>> > > > > > > >> time,
> >> > >>> > > > > > > >> > > any
> >> > >>> > > > > > > >> > > > >>> change to a broker will require bouncing
> >> the
> >> > >>> entire
> >> > >>> > > > > cluster
> >> > >>> > > > > > > >> since
> >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to any
> >> node
> >> > in
> >> > >>> the
> >> > >>> > > > > cluster.
> >> > >>> > > > > > > >> > > > >>> For this reason it may be simpler to have
> >> each
> >> > >>> node
> >> > >>> > be
> >> > >>> > > > > aware
> >> > >>> > > > > > > of
> >> > >>> > > > > > > >> its
> >> > >>> > > > > > > >> > > own
> >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during start up
> >> > time.
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an
> >> > external
> >> > >>> > > service
> >> > >>> > > > > > being
> >> > >>> > > > > > > >> > > available
> >> > >>> > > > > > > >> > > > >>> to
> >> > >>> > > > > > > >> > > > >>> serve rack information.
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a
> couple
> >> of
> >> > >>> other
> >> > >>> > > > > systems
> >> > >>> > > > > > > deal
> >> > >>> > > > > > > >> > with
> >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
> >> > >>> > > > > > > >> > > > >>> (Property File configuration)
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >>
> >> > >>> > > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >>
> >> > >>> > > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone
> >> > assignment
> >> > >>> > based
> >> > >>> > > on
> >> > >>> > > > > > > >> > > configuration.
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> Aditya
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen
> >> Wang <
> >> > >>> > > > > > > >> allenxw...@gmail.com
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > > >>> wrote:
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> > I would like to see if we can do both:
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to
> >> facilitate
> >> > >>> > migration
> >> > >>> > > > > with
> >> > >>> > > > > > > >> > existing
> >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > - Make rack an optional property for
> >> broker.
> >> > >>> If
> >> > >>> > rack
> >> > >>> > > > is
> >> > >>> > > > > > > >> available
> >> > >>> > > > > > > >> > > > from
> >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of truth.
> For
> >> > users
> >> > >>> > with
> >> > >>> > > > > > existing
> >> > >>> > > > > > > >> > > > >>> broker-rack
> >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use
> the
> >> > >>> pluggable
> >> > >>> > > way
> >> > >>> > > > > or
> >> > >>> > > > > > > they
> >> > >>> > > > > > > >> > can
> >> > >>> > > > > > > >> > > > >>> transfer
> >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack
> property.
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what happens
> >> at
> >> > >>> rolling
> >> > >>> > > > > upgrade
> >> > >>> > > > > > > >> when
> >> > >>> > > > > > > >> > we
> >> > >>> > > > > > > >> > > > have
> >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For brokers
> >> with
> >> > >>> older
> >> > >>> > > > > version
> >> > >>> > > > > > of
> >> > >>> > > > > > > >> > Kafka,
> >> > >>> > > > > > > >> > > > >>> will it
> >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is there
> >> any
> >> > >>> > > > workaround?
> >> > >>> > > > > I
> >> > >>> > > > > > > also
> >> > >>> > > > > > > >> > > think
> >> > >>> > > > > > > >> > > > it
> >> > >>> > > > > > > >> > > > >>> > would be better not to have rack in the
> >> > >>> controller
> >> > >>> > > > wire
> >> > >>> > > > > > > >> protocol
> >> > >>> > > > > > > >> > > but
> >> > >>> > > > > > > >> > > > >>> not
> >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > Thanks,
> >> > >>> > > > > > > >> > > > >>> > Allen
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd
> >> > Palino <
> >> > >>> > > > > > > >> tpal...@gmail.com>
> >> > >>> > > > > > > >> > > > >>> wrote:
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a
> pluggable
> >> > >>> locator.
> >> > >>> > > For
> >> > >>> > > > > > > >> example, we
> >> > >>> > > > > > > >> > > > >>> already
> >> > >>> > > > > > > >> > > > >>> > > have an interface for discovering
> >> > >>> information
> >> > >>> > > about
> >> > >>> > > > > the
> >> > >>> > > > > > > >> > physical
> >> > >>> > > > > > > >> > > > >>> location
> >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea
> of
> >> > >>> having to
> >> > >>> > > > > > maintain
> >> > >>> > > > > > > >> data
> >> > >>> > > > > > > >> > in
> >> > >>> > > > > > > >> > > > >>> > multiple
> >> > >>> > > > > > > >> > > > >>> > > places.
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> > > -Todd
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM,
> Aditya
> >> > >>> > Auradkar <
> >> > >>> > > > > > > >> > > > >>> > > aaurad...@linkedin.com.invalid>
> wrote:
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
> >> > >>> RackLocator
> >> > >>> > > class
> >> > >>> > > > > that
> >> > >>> > > > > > > is
> >> > >>> > > > > > > >> > > > pluggable
> >> > >>> > > > > > > >> > > > >>> > seems
> >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers
> to
> >> > >>> > potentially
> >> > >>> > > > > > non-ZK
> >> > >>> > > > > > > >> > storage
> >> > >>> > > > > > > >> > > > >>> for the
> >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think is
> >> > >>> necessary.
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in
> >> zk
> >> > >>> under
> >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> >> > >>> > > > > > > >> > > > >>> > > > similar to other broker properties
> >> and
> >> > >>> add a
> >> > >>> > > > config
> >> > >>> > > > > in
> >> > >>> > > > > > > >> > > > KafkaConfig
> >> > >>> > > > > > > >> > > > >>> > called
> >> > >>> > > > > > > >> > > > >>> > > > "rack".
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > >
> >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> >> > >>> > > > > > > >> > > "rack":
> >> > >>> > > > > > > >> > > > >>> > "abc"}
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > Aditya
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM,
> Gwen
> >> > >>> Shapira
> >> > >>> > <
> >> > >>> > > > > > > >> > > g...@confluent.io
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > >>> > wrote:
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a
> KIP
> >> > for
> >> > >>> > this.
> >> > >>> > > > This
> >> > >>> > > > > > is
> >> > >>> > > > > > > >> super
> >> > >>> > > > > > > >> > > > >>> important
> >> > >>> > > > > > > >> > > > >>> > > for
> >> > >>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many
> >> racks
> >> > as
> >> > >>> > > > > possible"?
> >> > >>> > > > > > > I'd
> >> > >>> > > > > > > >> > want
> >> > >>> > > > > > > >> > > to
> >> > >>> > > > > > > >> > > > >>> > balance
> >> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and
> >> > network
> >> > >>> > > > > utilization
> >> > >>> > > > > > > >> > (traffic
> >> > >>> > > > > > > >> > > > >>> within a
> >> > >>> > > > > > > >> > > > >>> > > > rack
> >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR
> >> switch).
> >> > One
> >> > >>> > > replica
> >> > >>> > > > > on
> >> > >>> > > > > > a
> >> > >>> > > > > > > >> > > different
> >> > >>> > > > > > > >> > > > >>> rack
> >> > >>> > > > > > > >> > > > >>> > > and
> >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if
> possible)
> >> > >>> sounds
> >> > >>> > > > better
> >> > >>> > > > > to
> >> > >>> > > > > > > me.
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems
> overly
> >> > >>> complex
> >> > >>> > > > > compared
> >> > >>> > > > > > to
> >> > >>> > > > > > > >> > > adding a
> >> > >>> > > > > > > >> > > > >>> > > > rack.number
> >> > >>> > > > > > > >> > > > >>> > > > > property to the broker properties
> >> > file.
> >> > >>> Why
> >> > >>> > do
> >> > >>> > > > we
> >> > >>> > > > > > want
> >> > >>> > > > > > > >> > that?
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM,
> >> > Allen
> >> > >>> > Wang <
> >> > >>> > > > > > > >> > > > >>> allenxw...@gmail.com>
> >> > >>> > > > > > > >> > > > >>> > > > wrote:
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack
> >> aware
> >> > >>> > replica
> >> > >>> > > > > > > >> assignment.
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >>
> >> > >>> > > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the
> >> isolation
> >> > >>> > > provided
> >> > >>> > > > by
> >> > >>> > > > > > the
> >> > >>> > > > > > > >> > racks
> >> > >>> > > > > > > >> > > in
> >> > >>> > > > > > > >> > > > >>> data
> >> > >>> > > > > > > >> > > > >>> > > > center
> >> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to
> racks
> >> to
> >> > >>> > provide
> >> > >>> > > > > fault
> >> > >>> > > > > > > >> > > tolerance.
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >
> >> > >>> > > > > > > >
> >> > >>> > > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> --
> >> > >>> Thanks,
> >> > >>> Neha
> >> > >>>
> >> > >>
> >> > >>
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to