KIP is updated include rack as an optional property for broker. Please take
a look and let me know if more details are needed.

For the case where some brokers have rack and some do not, the current KIP
uses the fail-fast behavior. If there are concerns, we can further discuss
this in the email thread or next hangout.



On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <allenxw...@gmail.com> wrote:

> That's a good question. I can think of three actions if the rack
> information is incomplete:
>
> 1. Treat the node without rack as if it is on its unique rack
> 2. Disregard all rack information and fallback to current algorithm
> 3. Fail-fast
>
> Now I think about it, one and three make more sense. The reason for
> fail-fast is that user mistake for not providing the rack may never be
> found if we tolerate that and the assignment may not be rack aware as the
> user has expected and this creates debug problems when things fail.
>
> What do you think? If not fail-fast, is there anyway we can make the user
> error standing out?
>
>
> On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <g...@confluent.io> wrote:
>
>> Thanks! Just to clarify, when some brokers have rack assignment and some
>> don't, do we act like none of them have it? or like those without
>> assignment are in their own rack?
>>
>> The first scenario is good when first setting up rack-awareness, but the
>> second makes more sense for on-going maintenance (I can totally see
>> someone
>> adding a node and forgetting to set the rack property, we don't want this
>> to change behavior for anything except the new node).
>>
>> What do you think?
>>
>> Gwen
>>
>> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <allenxw...@gmail.com>
>> wrote:
>>
>> > For scenario 1:
>> >
>> > - Add the rack information to broker property file or dynamically set
>> it in
>> > the wrapper code to bootstrap Kafka server. You would do that for all
>> > brokers and restart the brokers one by one.
>> >
>> > In this scenario, the complete broker to rack mapping may not be
>> available
>> > until every broker is restarted. During that time we fall back to
>> default
>> > replica assignment algorithm.
>> >
>> > For scenario 2:
>> >
>> > - Add the rack information to broker property file or dynamically set
>> it in
>> > the wrapper code and start the broker.
>> >
>> >
>> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <g...@confluent.io>
>> wrote:
>> >
>> > > Can you clarify the workflow for the following scenarios:
>> > >
>> > > 1. I currently have 6 brokers and want to add rack information for
>> each
>> > > 2. I'm adding a new broker and I want to specify which rack it
>> belongs on
>> > > while adding it.
>> > >
>> > > Thanks!
>> > >
>> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <allenxw...@gmail.com>
>> > wrote:
>> > >
>> > > > We discussed the KIP in the hangout today. The recommendation is to
>> > make
>> > > > rack as a broker property in ZooKeeper. For users with existing rack
>> > > > information stored somewhere, they would need to retrieve the
>> > information
>> > > > at broker start up and dynamically set the rack property, which can
>> be
>> > > > implemented as a wrapper to bootstrap broker. There will be no
>> > interface
>> > > or
>> > > > pluggable implementation to retrieve the rack information.
>> > > >
>> > > > The assumption is that you always need to restart the broker to
>> make a
>> > > > change to the rack.
>> > > >
>> > > > Once the rack becomes a broker property, it will be possible to make
>> > rack
>> > > > part of the meta data to help the consumer choose which in sync
>> replica
>> > > to
>> > > > consume from as part of the future consumer enhancement.
>> > > >
>> > > > I will update the KIP.
>> > > >
>> > > > Thanks,
>> > > > Allen
>> > > >
>> > > >
>> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <allenxw...@gmail.com>
>> > wrote:
>> > > >
>> > > > > I attended Tuesday's KIP hangout but this KIP was not discussed
>> due
>> > to
>> > > > > time constraint.
>> > > > >
>> > > > > However, after hearing discussion of KIP-35, I have the feeling
>> that
>> > > > > incompatibility (caused by new broker property) between brokers
>> with
>> > > > > different versions  will be solved there. In addition, having
>> stack
>> > in
>> > > > > broker property as meta data may also help consumers in the
>> future.
>> > So
>> > > I
>> > > > am
>> > > > > open to adding stack property to broker.
>> > > > >
>> > > > > Hopefully we can discuss this in the next KIP hangout.
>> > > > >
>> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <allenxw...@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > >> Can you send me the information on the next KIP hangout?
>> > > > >>
>> > > > >> Currently the broker-rack mapping is not cached. In KafkaApis,
>> > > > >> RackLocator.getRackInfo() is called each time the mapping is
>> needed
>> > > for
>> > > > >> auto topic creation. This will ensure latest mapping is used at
>> any
>> > > > time.
>> > > > >>
>> > > > >> The ability to get the complete mapping makes it simple to reuse
>> the
>> > > > same
>> > > > >> interface in command line tools.
>> > > > >>
>> > > > >>
>> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
>> > > > >> aaurad...@linkedin.com.invalid> wrote:
>> > > > >>
>> > > > >>> Perhaps we discuss this during the next KIP hangout?
>> > > > >>>
>> > > > >>> I do see that a pluggable rack locator can be useful but I do
>> see a
>> > > few
>> > > > >>> concerns:
>> > > > >>>
>> > > > >>> - The RackLocator (as described in the document), implies that
>> it
>> > can
>> > > > >>> discover rack information for any node in the cluster. How does
>> it
>> > > deal
>> > > > >>> with rack location changes? For example, if I moved broker id
>> (1)
>> > > from
>> > > > >>> rack
>> > > > >>> X to Y, I only have to start that broker with a newer rack
>> config.
>> > If
>> > > > >>> RackLocator discovers broker -> rack information at start up
>> time,
>> > > any
>> > > > >>> change to a broker will require bouncing the entire cluster
>> since
>> > > > >>> createTopic requests can be sent to any node in the cluster.
>> > > > >>> For this reason it may be simpler to have each node be aware of
>> its
>> > > own
>> > > > >>> rack and persist it in ZK during start up time.
>> > > > >>>
>> > > > >>> - A pluggable RackLocator relies on an external service being
>> > > available
>> > > > >>> to
>> > > > >>> serve rack information.
>> > > > >>>
>> > > > >>> Out of curiosity, I looked up how a couple of other systems deal
>> > with
>> > > > >>> zone/rack awareness.
>> > > > >>> For Cassandra some interesting modes are:
>> > > > >>> (Property File configuration)
>> > > > >>>
>> > > > >>>
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>> > > > >>> (Dynamic inference)
>> > > > >>>
>> > > > >>>
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>> > > > >>>
>> > > > >>> Voldemort does a static node -> zone assignment based on
>> > > configuration.
>> > > > >>>
>> > > > >>> Aditya
>> > > > >>>
>> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
>> allenxw...@gmail.com
>> > >
>> > > > >>> wrote:
>> > > > >>>
>> > > > >>> > I would like to see if we can do both:
>> > > > >>> >
>> > > > >>> > - Make RackLocator pluggable to facilitate migration with
>> > existing
>> > > > >>> > broker-rack mapping
>> > > > >>> >
>> > > > >>> > - Make rack an optional property for broker. If rack is
>> available
>> > > > from
>> > > > >>> > broker, treat it as source of truth. For users with existing
>> > > > >>> broker-rack
>> > > > >>> > mapping somewhere else, they can use the pluggable way or they
>> > can
>> > > > >>> transfer
>> > > > >>> > the mapping to the broker rack property.
>> > > > >>> >
>> > > > >>> > One thing I am not sure is what happens at rolling upgrade
>> when
>> > we
>> > > > have
>> > > > >>> > rack as a broker property. For brokers with older version of
>> > Kafka,
>> > > > >>> will it
>> > > > >>> > cause problem for them? If so, is there any workaround? I also
>> > > think
>> > > > it
>> > > > >>> > would be better not to have rack in the controller wire
>> protocol
>> > > but
>> > > > >>> not
>> > > > >>> > sure if it is achievable.
>> > > > >>> >
>> > > > >>> > Thanks,
>> > > > >>> > Allen
>> > > > >>> >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
>> tpal...@gmail.com>
>> > > > >>> wrote:
>> > > > >>> >
>> > > > >>> > > I tend to like the idea of a pluggable locator. For
>> example, we
>> > > > >>> already
>> > > > >>> > > have an interface for discovering information about the
>> > physical
>> > > > >>> location
>> > > > >>> > > of servers. I don't relish the idea of having to maintain
>> data
>> > in
>> > > > >>> > multiple
>> > > > >>> > > places.
>> > > > >>> > >
>> > > > >>> > > -Todd
>> > > > >>> > >
>> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
>> > > > >>> > > aaurad...@linkedin.com.invalid> wrote:
>> > > > >>> > >
>> > > > >>> > > > Thanks for starting this KIP Allen.
>> > > > >>> > > >
>> > > > >>> > > > I agree with Gwen that having a RackLocator class that is
>> > > > pluggable
>> > > > >>> > seems
>> > > > >>> > > > to be too complex. The KIP refers to potentially non-ZK
>> > storage
>> > > > >>> for the
>> > > > >>> > > > rack info which I don't think is necessary.
>> > > > >>> > > >
>> > > > >>> > > > Perhaps we can persist this info in zk under
>> > > > >>> /brokers/ids/<broker_id>
>> > > > >>> > > > similar to other broker properties and add a config in
>> > > > KafkaConfig
>> > > > >>> > called
>> > > > >>> > > > "rack".
>> > > > >>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>> > > "rack":
>> > > > >>> > "abc"}
>> > > > >>> > > >
>> > > > >>> > > > Aditya
>> > > > >>> > > >
>> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
>> > > g...@confluent.io
>> > > > >
>> > > > >>> > wrote:
>> > > > >>> > > >
>> > > > >>> > > > > Hi,
>> > > > >>> > > > >
>> > > > >>> > > > > First, thanks for putting out a KIP for this. This is
>> super
>> > > > >>> important
>> > > > >>> > > for
>> > > > >>> > > > > production deployments of Kafka.
>> > > > >>> > > > >
>> > > > >>> > > > > Few questions:
>> > > > >>> > > > >
>> > > > >>> > > > > 1) Are we sure we want "as many racks as possible"? I'd
>> > want
>> > > to
>> > > > >>> > balance
>> > > > >>> > > > > between safety (more racks) and network utilization
>> > (traffic
>> > > > >>> within a
>> > > > >>> > > > rack
>> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica on a
>> > > different
>> > > > >>> rack
>> > > > >>> > > and
>> > > > >>> > > > > the rest on same rack (if possible) sounds better to me.
>> > > > >>> > > > >
>> > > > >>> > > > > 2) Rack-locator class seems overly complex compared to
>> > > adding a
>> > > > >>> > > > rack.number
>> > > > >>> > > > > property to the broker properties file. Why do we want
>> > that?
>> > > > >>> > > > >
>> > > > >>> > > > > Gwen
>> > > > >>> > > > >
>> > > > >>> > > > >
>> > > > >>> > > > >
>> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
>> > > > >>> allenxw...@gmail.com>
>> > > > >>> > > > wrote:
>> > > > >>> > > > >
>> > > > >>> > > > > > Hello Kafka Developers,
>> > > > >>> > > > > >
>> > > > >>> > > > > > I just created KIP-36 for rack aware replica
>> assignment.
>> > > > >>> > > > > >
>> > > > >>> > > > > >
>> > > > >>> > > > > >
>> > > > >>> > > > >
>> > > > >>> > > >
>> > > > >>> > >
>> > > > >>> >
>> > > > >>>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>> > > > >>> > > > > >
>> > > > >>> > > > > > The goal is to utilize the isolation provided by the
>> > racks
>> > > in
>> > > > >>> data
>> > > > >>> > > > center
>> > > > >>> > > > > > and distribute replicas to racks to provide fault
>> > > tolerance.
>> > > > >>> > > > > >
>> > > > >>> > > > > > Comments are welcome.
>> > > > >>> > > > > >
>> > > > >>> > > > > > Thanks,
>> > > > >>> > > > > > Allen
>> > > > >>> > > > > >
>> > > > >>> > > > >
>> > > > >>> > > >
>> > > > >>> > >
>> > > > >>> >
>> > > > >>>
>> > > > >>
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to