Added the rolling upgrade instruction in the KIP, similar to those in 0.9.0
release notes.

On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <allenxw...@gmail.com> wrote:

> Hi Jun,
>
> The reason that TopicMetadataResponse is not included in the KIP is that
> it currently is not version aware . So we need to introduce version to it
> in order to make sure backward compatibility. It seems to me a big change.
> Do we want to couple it with this KIP? Do we need to further discuss what
> information to include in the new version besides rack? For example, should
> we include broker security protocol in TopicMetadataResponse?
>
> The other option is to make it a separate KIP to make
> TopicMetadataResponse version aware and decide what to include, and make
> this KIP focus on the rack aware algorithm, admin tools  and related
> changes to inter-broker protocol .
>
> Thanks,
> Allen
>
>
>
>
> On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <j...@confluent.io> wrote:
>
>> Allen,
>>
>> Thanks for the proposal. A few comments.
>>
>> 1. Since this KIP changes the inter broker communication protocol
>> (UpdateMetadataRequest), we will need to document the upgrade path
>> (similar
>> to what's described in
>> http://kafka.apache.org/090/documentation.html#upgrade).
>>
>> 2. It might be useful to include the rack info of the broker in
>> TopicMetadataResponse. This can be useful for administrative tasks, as
>> well
>> as read affinity in the future.
>>
>> Jun
>>
>>
>>
>> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <allenxw...@gmail.com> wrote:
>>
>> > If there are no more comments I would like to call for a vote.
>> >
>> >
>> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <allenxw...@gmail.com>
>> wrote:
>> >
>> > > KIP is updated with more details and how to handle the situation where
>> > > rack information is incomplete.
>> > >
>> > > In the situation where rack information is incomplete, but we want to
>> > > continue with the assignment, I have suggested to ignore all rack
>> > > information and fallback to original algorithm. The reason is
>> explained
>> > > below:
>> > >
>> > > The other options are to assume that the broker without the rack
>> belong
>> > to
>> > > its own unique rack, or they belong to one "default" rack. Either way
>> we
>> > > choose, it is highly likely to result in uneven number of brokers in
>> > racks,
>> > > and it is quite possible that the "made up" racks will have much fewer
>> > > number of brokers. As I explained in the KIP, uneven number of
>> brokers in
>> > > racks will lead to uneven distribution of replicas among brokers (even
>> > > though the leader distribution is still even). The brokers in the rack
>> > that
>> > > has fewer number of brokers will get more replicas per broker than
>> > brokers
>> > > in other racks.
>> > >
>> > > Given this fact and the replica assignment produced will be incorrect
>> > > anyway from rack aware point of view, ignoring all rack information
>> and
>> > > fallback to the original algorithm is not a bad choice since it will
>> at
>> > > least have a better guarantee of replica distribution.
>> > >
>> > > Also for command line tools it gives user a choice if for any reason
>> they
>> > > want to ignore rack information and fallback to the original
>> algorithm.
>> > >
>> > >
>> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <allenxw...@gmail.com>
>> > wrote:
>> > >
>> > >> I am busy with some time pressing issues for the last few days. I
>> will
>> > >> think about how the incomplete rack information will affect the
>> balance
>> > and
>> > >> update the KIP by early next week.
>> > >>
>> > >> Thanks,
>> > >> Allen
>> > >>
>> > >>
>> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <n...@confluent.io>
>> > wrote:
>> > >>
>> > >>> Few suggestions on improving the KIP
>> > >>>
>> > >>> *If some brokers have rack, and some do not, the algorithm will
>> thrown
>> > an
>> > >>> > exception. This is to prevent incorrect assignment caused by user
>> > >>> error.*
>> > >>>
>> > >>>
>> > >>> In the KIP, can you clearly state the user-facing behavior when some
>> > >>> brokers have rack information and some don't. Which actions and
>> > requests
>> > >>> will error out and how?
>> > >>>
>> > >>> *Even distribution of partition leadership among brokers*
>> > >>>
>> > >>>
>> > >>> There is some information about arranging the sorted broker list
>> > >>> interlaced
>> > >>> with rack ids. Can you describe the changes to the current algorithm
>> > in a
>> > >>> little more detail? How does this interlacing work if only a subset
>> of
>> > >>> brokers have the rack id configured? Does this still work if uneven
>> #
>> > of
>> > >>> brokers are assigned to each rack? It might work, I'm looking for
>> more
>> > >>> details on the changes, since it will affect the behavior seen by
>> the
>> > >>> user
>> > >>> - imbalance on either the leaders or data or both.
>> > >>>
>> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
>> > aaurad...@linkedin.com>
>> > >>> wrote:
>> > >>>
>> > >>> > I think this sounds reasonable. Anyone else have comments?
>> > >>> >
>> > >>> > Aditya
>> > >>> >
>> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <allenxw...@gmail.com
>> >
>> > >>> wrote:
>> > >>> >
>> > >>> > > During the discussion in the hangout, it was mentioned that it
>> > would
>> > >>> be
>> > >>> > > desirable that consumers know the rack information of the
>> brokers
>> > so
>> > >>> that
>> > >>> > > they can consume from the broker in the same rack to reduce
>> > latency.
>> > >>> As I
>> > >>> > > understand this will only be beneficial if consumer can consume
>> > from
>> > >>> any
>> > >>> > > broker in ISR, which is not possible now.
>> > >>> > >
>> > >>> > > I suggest we skip the change to TMR. Once the change is made to
>> > >>> consumer
>> > >>> > to
>> > >>> > > be able to consume from any broker in ISR, the rack information
>> can
>> > >>> be
>> > >>> > > added to TMR.
>> > >>> > >
>> > >>> > > Another thing I want to confirm is  command line behavior. I
>> think
>> > >>> the
>> > >>> > > desirable default behavior is to fail fast on command line for
>> > >>> incomplete
>> > >>> > > rack mapping. The error message can include further instruction
>> > that
>> > >>> > tells
>> > >>> > > the user to add an extra argument (like
>> "--allow-partial-rackinfo")
>> > >>> to
>> > >>> > > suppress the error and do an imperfect rack aware assignment. If
>> > the
>> > >>> > > default behavior is to allow incomplete mapping, the error can
>> > still
>> > >>> be
>> > >>> > > easily missed.
>> > >>> > >
>> > >>> > > The affected command line tools are TopicCommand and
>> > >>> > > ReassignPartitionsCommand.
>> > >>> > >
>> > >>> > > Thanks,
>> > >>> > > Allen
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
>> > >>> > aaurad...@linkedin.com>
>> > >>> > > wrote:
>> > >>> > >
>> > >>> > > > Hi Allen,
>> > >>> > > >
>> > >>> > > > For TopicMetadataResponse to understand version, you can bump
>> up
>> > >>> the
>> > >>> > > > request version itself. Based on the version of the request,
>> the
>> > >>> > response
>> > >>> > > > can be appropriately serialized. It shouldn't be a huge
>> change.
>> > For
>> > >>> > > > example: We went through something similar for ProduceRequest
>> > >>> recently
>> > >>> > (
>> > >>> > > > https://reviews.apache.org/r/33378/)
>> > >>> > > > I guess the reason protocol information is not included in the
>> > TMR
>> > >>> is
>> > >>> > > > because the topic itself is independent of any particular
>> > protocol
>> > >>> (SSL
>> > >>> > > vs
>> > >>> > > > Plaintext). Having said that, I'm not sure we even need rack
>> > >>> > information
>> > >>> > > in
>> > >>> > > > TMR. What usecase were you thinking of initially?
>> > >>> > > >
>> > >>> > > > For 1 - I'd be fine with adding an option to the command line
>> > tools
>> > >>> > that
>> > >>> > > > check rack assignment. For e.g. "--strict-assignment" or
>> > something
>> > >>> > > similar.
>> > >>> > > >
>> > >>> > > > Aditya
>> > >>> > > >
>> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
>> > allenxw...@gmail.com>
>> > >>> > > wrote:
>> > >>> > > >
>> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One
>> > >>> thing I
>> > >>> > > have
>> > >>> > > > > changed is removing the proposal to add rack to
>> > >>> > TopicMetadataResponse.
>> > >>> > > > The
>> > >>> > > > > reason is that unlike UpdateMetadataRequest,
>> > >>> TopicMetadataResponse
>> > >>> > does
>> > >>> > > > not
>> > >>> > > > > understand version. I don't see a way to include rack
>> without
>> > >>> > breaking
>> > >>> > > > old
>> > >>> > > > > version of clients. That's probably why secure protocol is
>> not
>> > >>> > included
>> > >>> > > > in
>> > >>> > > > > the TopicMetadataResponse either. I think it will be a much
>> > >>> bigger
>> > >>> > > change
>> > >>> > > > > to include rack in TopicMetadataResponse.
>> > >>> > > > >
>> > >>> > > > > For 1, my concern is that doing rack aware assignment
>> without
>> > >>> > complete
>> > >>> > > > > broker to rack mapping will result in assignment that is not
>> > rack
>> > >>> > aware
>> > >>> > > > and
>> > >>> > > > > fail to provide fault tolerance in the event of rack outage.
>> > This
>> > >>> > kind
>> > >>> > > of
>> > >>> > > > > problem will be difficult to surface. And the cost of this
>> > >>> problem is
>> > >>> > > > high:
>> > >>> > > > > you have to do partition reassignment if you are lucky to
>> spot
>> > >>> the
>> > >>> > > > problem
>> > >>> > > > > early on or face the consequence of data loss during real
>> rack
>> > >>> > outage.
>> > >>> > > > >
>> > >>> > > > > I do see the concern of fail-fast as it might also cause
>> data
>> > >>> loss if
>> > >>> > > > > producer is not able produce the message due to topic
>> creation
>> > >>> > failure.
>> > >>> > > > Is
>> > >>> > > > > it feasible to treat dynamic topic creation and command
>> tools
>> > >>> > > > differently?
>> > >>> > > > > We allow dynamic topic creation with incomplete broker-rack
>> > >>> mapping
>> > >>> > and
>> > >>> > > > > fail fast in command line. Another option is to let user
>> > >>> determine
>> > >>> > the
>> > >>> > > > > behavior for command line. For example, by default fail
>> fast in
>> > >>> > command
>> > >>> > > > > line but allow incomplete broker-rack mapping if another
>> switch
>> > >>> is
>> > >>> > > > > provided.
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
>> > >>> > > > > aaurad...@linkedin.com.invalid> wrote:
>> > >>> > > > >
>> > >>> > > > > > Hey Allen,
>> > >>> > > > > >
>> > >>> > > > > > 1. If we choose fail fast topic creation, we will have
>> topic
>> > >>> > creation
>> > >>> > > > > > failures while upgrading the cluster. I really doubt we
>> want
>> > >>> this
>> > >>> > > > > behavior.
>> > >>> > > > > > Ideally, this should be invisible to clients of a cluster.
>> > >>> > Currently,
>> > >>> > > > > each
>> > >>> > > > > > broker is effectively its own rack. So we probably can use
>> > the
>> > >>> rack
>> > >>> > > > > > information whenever possible but not make it a hard
>> > >>> requirement.
>> > >>> > To
>> > >>> > > > > extend
>> > >>> > > > > > Gwen's example, one badly configured broker should not
>> > degrade
>> > >>> > topic
>> > >>> > > > > > creation for the entire cluster.
>> > >>> > > > > >
>> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade
>> > >>> piece to
>> > >>> > > > > confirm
>> > >>> > > > > > that old clients will not see errors? I believe
>> > >>> > > > > ZookeeperConsumerConnector
>> > >>> > > > > > reads the Broker objects from ZK. I wanted to confirm that
>> > this
>> > >>> > will
>> > >>> > > > not
>> > >>> > > > > > cause any problems.
>> > >>> > > > > >
>> > >>> > > > > > 3. Could you elaborate your proposed changes to the
>> > >>> > > > UpdateMetadataRequest
>> > >>> > > > > > in the "Public Interfaces" section? Personally, I find
>> this
>> > >>> format
>> > >>> > > easy
>> > >>> > > > > to
>> > >>> > > > > > read in terms of wire protocol changes:
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>> > >>> > > > > >
>> > >>> > > > > > Aditya
>> > >>> > > > > >
>> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
>> > >>> allenxw...@gmail.com>
>> > >>> > > > > wrote:
>> > >>> > > > > >
>> > >>> > > > > > > KIP is updated include rack as an optional property for
>> > >>> broker.
>> > >>> > > > Please
>> > >>> > > > > > take
>> > >>> > > > > > > a look and let me know if more details are needed.
>> > >>> > > > > > >
>> > >>> > > > > > > For the case where some brokers have rack and some do
>> not,
>> > >>> the
>> > >>> > > > current
>> > >>> > > > > > KIP
>> > >>> > > > > > > uses the fail-fast behavior. If there are concerns, we
>> can
>> > >>> > further
>> > >>> > > > > > discuss
>> > >>> > > > > > > this in the email thread or next hangout.
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
>> > >>> > allenxw...@gmail.com
>> > >>> > > >
>> > >>> > > > > > wrote:
>> > >>> > > > > > >
>> > >>> > > > > > > > That's a good question. I can think of three actions
>> if
>> > the
>> > >>> > rack
>> > >>> > > > > > > > information is incomplete:
>> > >>> > > > > > > >
>> > >>> > > > > > > > 1. Treat the node without rack as if it is on its
>> unique
>> > >>> rack
>> > >>> > > > > > > > 2. Disregard all rack information and fallback to
>> current
>> > >>> > > algorithm
>> > >>> > > > > > > > 3. Fail-fast
>> > >>> > > > > > > >
>> > >>> > > > > > > > Now I think about it, one and three make more sense.
>> The
>> > >>> reason
>> > >>> > > for
>> > >>> > > > > > > > fail-fast is that user mistake for not providing the
>> rack
>> > >>> may
>> > >>> > > never
>> > >>> > > > > be
>> > >>> > > > > > > > found if we tolerate that and the assignment may not
>> be
>> > >>> rack
>> > >>> > > aware
>> > >>> > > > as
>> > >>> > > > > > the
>> > >>> > > > > > > > user has expected and this creates debug problems when
>> > >>> things
>> > >>> > > fail.
>> > >>> > > > > > > >
>> > >>> > > > > > > > What do you think? If not fail-fast, is there anyway
>> we
>> > can
>> > >>> > make
>> > >>> > > > the
>> > >>> > > > > > user
>> > >>> > > > > > > > error standing out?
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
>> > >>> > > g...@confluent.io>
>> > >>> > > > > > > wrote:
>> > >>> > > > > > > >
>> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
>> > >>> > assignment
>> > >>> > > > and
>> > >>> > > > > > some
>> > >>> > > > > > > >> don't, do we act like none of them have it? or like
>> > those
>> > >>> > > without
>> > >>> > > > > > > >> assignment are in their own rack?
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> The first scenario is good when first setting up
>> > >>> > rack-awareness,
>> > >>> > > > but
>> > >>> > > > > > the
>> > >>> > > > > > > >> second makes more sense for on-going maintenance (I
>> can
>> > >>> > totally
>> > >>> > > > see
>> > >>> > > > > > > >> someone
>> > >>> > > > > > > >> adding a node and forgetting to set the rack
>> property,
>> > we
>> > >>> > don't
>> > >>> > > > want
>> > >>> > > > > > > this
>> > >>> > > > > > > >> to change behavior for anything except the new node).
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> What do you think?
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> Gwen
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
>> > >>> > > > allenxw...@gmail.com>
>> > >>> > > > > > > >> wrote:
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> > For scenario 1:
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > - Add the rack information to broker property file
>> or
>> > >>> > > > dynamically
>> > >>> > > > > > set
>> > >>> > > > > > > >> it in
>> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You
>> would
>> > do
>> > >>> > that
>> > >>> > > > for
>> > >>> > > > > > all
>> > >>> > > > > > > >> > brokers and restart the brokers one by one.
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > In this scenario, the complete broker to rack
>> mapping
>> > >>> may
>> > >>> > not
>> > >>> > > be
>> > >>> > > > > > > >> available
>> > >>> > > > > > > >> > until every broker is restarted. During that time
>> we
>> > >>> fall
>> > >>> > back
>> > >>> > > > to
>> > >>> > > > > > > >> default
>> > >>> > > > > > > >> > replica assignment algorithm.
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > For scenario 2:
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > - Add the rack information to broker property file
>> or
>> > >>> > > > dynamically
>> > >>> > > > > > set
>> > >>> > > > > > > >> it in
>> > >>> > > > > > > >> > the wrapper code and start the broker.
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
>> > >>> > > > g...@confluent.io>
>> > >>> > > > > > > >> wrote:
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > > Can you clarify the workflow for the following
>> > >>> scenarios:
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add
>> rack
>> > >>> > > information
>> > >>> > > > > for
>> > >>> > > > > > > >> each
>> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify
>> > which
>> > >>> > rack
>> > >>> > > it
>> > >>> > > > > > > >> belongs on
>> > >>> > > > > > > >> > > while adding it.
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > Thanks!
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
>> > >>> > > > > allenxw...@gmail.com
>> > >>> > > > > > >
>> > >>> > > > > > > >> > wrote:
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > > We discussed the KIP in the hangout today. The
>> > >>> > > > recommendation
>> > >>> > > > > is
>> > >>> > > > > > > to
>> > >>> > > > > > > >> > make
>> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For
>> users
>> > >>> with
>> > >>> > > > > existing
>> > >>> > > > > > > rack
>> > >>> > > > > > > >> > > > information stored somewhere, they would need
>> to
>> > >>> > retrieve
>> > >>> > > > the
>> > >>> > > > > > > >> > information
>> > >>> > > > > > > >> > > > at broker start up and dynamically set the rack
>> > >>> > property,
>> > >>> > > > > which
>> > >>> > > > > > > can
>> > >>> > > > > > > >> be
>> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker.
>> > There
>> > >>> will
>> > >>> > > be
>> > >>> > > > no
>> > >>> > > > > > > >> > interface
>> > >>> > > > > > > >> > > or
>> > >>> > > > > > > >> > > > pluggable implementation to retrieve the rack
>> > >>> > information.
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > The assumption is that you always need to
>> restart
>> > >>> the
>> > >>> > > broker
>> > >>> > > > > to
>> > >>> > > > > > > >> make a
>> > >>> > > > > > > >> > > > change to the rack.
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > Once the rack becomes a broker property, it
>> will
>> > be
>> > >>> > > possible
>> > >>> > > > > to
>> > >>> > > > > > > make
>> > >>> > > > > > > >> > rack
>> > >>> > > > > > > >> > > > part of the meta data to help the consumer
>> choose
>> > >>> which
>> > >>> > in
>> > >>> > > > > sync
>> > >>> > > > > > > >> replica
>> > >>> > > > > > > >> > > to
>> > >>> > > > > > > >> > > > consume from as part of the future consumer
>> > >>> enhancement.
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > I will update the KIP.
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > Thanks,
>> > >>> > > > > > > >> > > > Allen
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
>> > >>> > > > > > allenxw...@gmail.com>
>> > >>> > > > > > > >> > wrote:
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP
>> > was
>> > >>> not
>> > >>> > > > > > discussed
>> > >>> > > > > > > >> due
>> > >>> > > > > > > >> > to
>> > >>> > > > > > > >> > > > > time constraint.
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > > However, after hearing discussion of KIP-35,
>> I
>> > >>> have
>> > >>> > the
>> > >>> > > > > > feeling
>> > >>> > > > > > > >> that
>> > >>> > > > > > > >> > > > > incompatibility (caused by new broker
>> property)
>> > >>> > between
>> > >>> > > > > > brokers
>> > >>> > > > > > > >> with
>> > >>> > > > > > > >> > > > > different versions  will be solved there. In
>> > >>> addition,
>> > >>> > > > > having
>> > >>> > > > > > > >> stack
>> > >>> > > > > > > >> > in
>> > >>> > > > > > > >> > > > > broker property as meta data may also help
>> > >>> consumers
>> > >>> > in
>> > >>> > > > the
>> > >>> > > > > > > >> future.
>> > >>> > > > > > > >> > So
>> > >>> > > > > > > >> > > I
>> > >>> > > > > > > >> > > > am
>> > >>> > > > > > > >> > > > > open to adding stack property to broker.
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
>> > >>> hangout.
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
>> > >>> > > > > > > allenxw...@gmail.com
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > > > wrote:
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > >> Can you send me the information on the next
>> KIP
>> > >>> > > hangout?
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not
>> > cached.
>> > >>> In
>> > >>> > > > > > KafkaApis,
>> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each
>> time
>> > the
>> > >>> > > mapping
>> > >>> > > > > is
>> > >>> > > > > > > >> needed
>> > >>> > > > > > > >> > > for
>> > >>> > > > > > > >> > > > >> auto topic creation. This will ensure latest
>> > >>> mapping
>> > >>> > is
>> > >>> > > > > used
>> > >>> > > > > > at
>> > >>> > > > > > > >> any
>> > >>> > > > > > > >> > > > time.
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >> The ability to get the complete mapping
>> makes
>> > it
>> > >>> > simple
>> > >>> > > > to
>> > >>> > > > > > > reuse
>> > >>> > > > > > > >> the
>> > >>> > > > > > > >> > > > same
>> > >>> > > > > > > >> > > > >> interface in command line tools.
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
>> > >>> Auradkar <
>> > >>> > > > > > > >> > > > >> aaurad...@linkedin.com.invalid> wrote:
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
>> > >>> hangout?
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can
>> be
>> > >>> useful
>> > >>> > > > but I
>> > >>> > > > > > do
>> > >>> > > > > > > >> see a
>> > >>> > > > > > > >> > > few
>> > >>> > > > > > > >> > > > >>> concerns:
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in the
>> > >>> document),
>> > >>> > > > implies
>> > >>> > > > > > that
>> > >>> > > > > > > >> it
>> > >>> > > > > > > >> > can
>> > >>> > > > > > > >> > > > >>> discover rack information for any node in
>> the
>> > >>> > cluster.
>> > >>> > > > How
>> > >>> > > > > > > does
>> > >>> > > > > > > >> it
>> > >>> > > > > > > >> > > deal
>> > >>> > > > > > > >> > > > >>> with rack location changes? For example,
>> if I
>> > >>> moved
>> > >>> > > > broker
>> > >>> > > > > > id
>> > >>> > > > > > > >> (1)
>> > >>> > > > > > > >> > > from
>> > >>> > > > > > > >> > > > >>> rack
>> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that broker
>> with
>> > a
>> > >>> > newer
>> > >>> > > > rack
>> > >>> > > > > > > >> config.
>> > >>> > > > > > > >> > If
>> > >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
>> > >>> information at
>> > >>> > > > start
>> > >>> > > > > up
>> > >>> > > > > > > >> time,
>> > >>> > > > > > > >> > > any
>> > >>> > > > > > > >> > > > >>> change to a broker will require bouncing
>> the
>> > >>> entire
>> > >>> > > > > cluster
>> > >>> > > > > > > >> since
>> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to any
>> node
>> > in
>> > >>> the
>> > >>> > > > > cluster.
>> > >>> > > > > > > >> > > > >>> For this reason it may be simpler to have
>> each
>> > >>> node
>> > >>> > be
>> > >>> > > > > aware
>> > >>> > > > > > > of
>> > >>> > > > > > > >> its
>> > >>> > > > > > > >> > > own
>> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during start up
>> > time.
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an
>> > external
>> > >>> > > service
>> > >>> > > > > > being
>> > >>> > > > > > > >> > > available
>> > >>> > > > > > > >> > > > >>> to
>> > >>> > > > > > > >> > > > >>> serve rack information.
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple
>> of
>> > >>> other
>> > >>> > > > > systems
>> > >>> > > > > > > deal
>> > >>> > > > > > > >> > with
>> > >>> > > > > > > >> > > > >>> zone/rack awareness.
>> > >>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
>> > >>> > > > > > > >> > > > >>> (Property File configuration)
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >>
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>> > >>> > > > > > > >> > > > >>> (Dynamic inference)
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >>
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone
>> > assignment
>> > >>> > based
>> > >>> > > on
>> > >>> > > > > > > >> > > configuration.
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> Aditya
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen
>> Wang <
>> > >>> > > > > > > >> allenxw...@gmail.com
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > > >>> wrote:
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> > I would like to see if we can do both:
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to
>> facilitate
>> > >>> > migration
>> > >>> > > > > with
>> > >>> > > > > > > >> > existing
>> > >>> > > > > > > >> > > > >>> > broker-rack mapping
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > - Make rack an optional property for
>> broker.
>> > >>> If
>> > >>> > rack
>> > >>> > > > is
>> > >>> > > > > > > >> available
>> > >>> > > > > > > >> > > > from
>> > >>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For
>> > users
>> > >>> > with
>> > >>> > > > > > existing
>> > >>> > > > > > > >> > > > >>> broker-rack
>> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
>> > >>> pluggable
>> > >>> > > way
>> > >>> > > > > or
>> > >>> > > > > > > they
>> > >>> > > > > > > >> > can
>> > >>> > > > > > > >> > > > >>> transfer
>> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack property.
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what happens
>> at
>> > >>> rolling
>> > >>> > > > > upgrade
>> > >>> > > > > > > >> when
>> > >>> > > > > > > >> > we
>> > >>> > > > > > > >> > > > have
>> > >>> > > > > > > >> > > > >>> > rack as a broker property. For brokers
>> with
>> > >>> older
>> > >>> > > > > version
>> > >>> > > > > > of
>> > >>> > > > > > > >> > Kafka,
>> > >>> > > > > > > >> > > > >>> will it
>> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is there
>> any
>> > >>> > > > workaround?
>> > >>> > > > > I
>> > >>> > > > > > > also
>> > >>> > > > > > > >> > > think
>> > >>> > > > > > > >> > > > it
>> > >>> > > > > > > >> > > > >>> > would be better not to have rack in the
>> > >>> controller
>> > >>> > > > wire
>> > >>> > > > > > > >> protocol
>> > >>> > > > > > > >> > > but
>> > >>> > > > > > > >> > > > >>> not
>> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > Thanks,
>> > >>> > > > > > > >> > > > >>> > Allen
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd
>> > Palino <
>> > >>> > > > > > > >> tpal...@gmail.com>
>> > >>> > > > > > > >> > > > >>> wrote:
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
>> > >>> locator.
>> > >>> > > For
>> > >>> > > > > > > >> example, we
>> > >>> > > > > > > >> > > > >>> already
>> > >>> > > > > > > >> > > > >>> > > have an interface for discovering
>> > >>> information
>> > >>> > > about
>> > >>> > > > > the
>> > >>> > > > > > > >> > physical
>> > >>> > > > > > > >> > > > >>> location
>> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of
>> > >>> having to
>> > >>> > > > > > maintain
>> > >>> > > > > > > >> data
>> > >>> > > > > > > >> > in
>> > >>> > > > > > > >> > > > >>> > multiple
>> > >>> > > > > > > >> > > > >>> > > places.
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> > > -Todd
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
>> > >>> > Auradkar <
>> > >>> > > > > > > >> > > > >>> > > aaurad...@linkedin.com.invalid> wrote:
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
>> > >>> RackLocator
>> > >>> > > class
>> > >>> > > > > that
>> > >>> > > > > > > is
>> > >>> > > > > > > >> > > > pluggable
>> > >>> > > > > > > >> > > > >>> > seems
>> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
>> > >>> > potentially
>> > >>> > > > > > non-ZK
>> > >>> > > > > > > >> > storage
>> > >>> > > > > > > >> > > > >>> for the
>> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think is
>> > >>> necessary.
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in
>> zk
>> > >>> under
>> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
>> > >>> > > > > > > >> > > > >>> > > > similar to other broker properties
>> and
>> > >>> add a
>> > >>> > > > config
>> > >>> > > > > in
>> > >>> > > > > > > >> > > > KafkaConfig
>> > >>> > > > > > > >> > > > >>> > called
>> > >>> > > > > > > >> > > > >>> > > > "rack".
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > >
>> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>> > >>> > > > > > > >> > > "rack":
>> > >>> > > > > > > >> > > > >>> > "abc"}
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > Aditya
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
>> > >>> Shapira
>> > >>> > <
>> > >>> > > > > > > >> > > g...@confluent.io
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > >>> > wrote:
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > > Hi,
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP
>> > for
>> > >>> > this.
>> > >>> > > > This
>> > >>> > > > > > is
>> > >>> > > > > > > >> super
>> > >>> > > > > > > >> > > > >>> important
>> > >>> > > > > > > >> > > > >>> > > for
>> > >>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > Few questions:
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many
>> racks
>> > as
>> > >>> > > > > possible"?
>> > >>> > > > > > > I'd
>> > >>> > > > > > > >> > want
>> > >>> > > > > > > >> > > to
>> > >>> > > > > > > >> > > > >>> > balance
>> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and
>> > network
>> > >>> > > > > utilization
>> > >>> > > > > > > >> > (traffic
>> > >>> > > > > > > >> > > > >>> within a
>> > >>> > > > > > > >> > > > >>> > > > rack
>> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR
>> switch).
>> > One
>> > >>> > > replica
>> > >>> > > > > on
>> > >>> > > > > > a
>> > >>> > > > > > > >> > > different
>> > >>> > > > > > > >> > > > >>> rack
>> > >>> > > > > > > >> > > > >>> > > and
>> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible)
>> > >>> sounds
>> > >>> > > > better
>> > >>> > > > > to
>> > >>> > > > > > > me.
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly
>> > >>> complex
>> > >>> > > > > compared
>> > >>> > > > > > to
>> > >>> > > > > > > >> > > adding a
>> > >>> > > > > > > >> > > > >>> > > > rack.number
>> > >>> > > > > > > >> > > > >>> > > > > property to the broker properties
>> > file.
>> > >>> Why
>> > >>> > do
>> > >>> > > > we
>> > >>> > > > > > want
>> > >>> > > > > > > >> > that?
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > Gwen
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM,
>> > Allen
>> > >>> > Wang <
>> > >>> > > > > > > >> > > > >>> allenxw...@gmail.com>
>> > >>> > > > > > > >> > > > >>> > > > wrote:
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack
>> aware
>> > >>> > replica
>> > >>> > > > > > > >> assignment.
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >>
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the
>> isolation
>> > >>> > > provided
>> > >>> > > > by
>> > >>> > > > > > the
>> > >>> > > > > > > >> > racks
>> > >>> > > > > > > >> > > in
>> > >>> > > > > > > >> > > > >>> data
>> > >>> > > > > > > >> > > > >>> > > > center
>> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks
>> to
>> > >>> > provide
>> > >>> > > > > fault
>> > >>> > > > > > > >> > > tolerance.
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
>> > >>> > > > > > > >> > > > >>> > > > > > Allen
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >>
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Thanks,
>> > >>> Neha
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Reply via email to