Hey Allen,

1. If we choose fail fast topic creation, we will have topic creation
failures while upgrading the cluster. I really doubt we want this behavior.
Ideally, this should be invisible to clients of a cluster. Currently, each
broker is effectively its own rack. So we probably can use the rack
information whenever possible but not make it a hard requirement. To extend
Gwen's example, one badly configured broker should not degrade topic
creation for the entire cluster.

2. Upgrade scenario - Can you add a section on the upgrade piece to confirm
that old clients will not see errors? I believe ZookeeperConsumerConnector
reads the Broker objects from ZK. I wanted to confirm that this will not
cause any problems.

3. Could you elaborate your proposed changes to the UpdateMetadataRequest
in the "Public Interfaces" section? Personally, I find this format easy to
read in terms of wire protocol changes:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest

Aditya

On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <allenxw...@gmail.com> wrote:

> KIP is updated include rack as an optional property for broker. Please take
> a look and let me know if more details are needed.
>
> For the case where some brokers have rack and some do not, the current KIP
> uses the fail-fast behavior. If there are concerns, we can further discuss
> this in the email thread or next hangout.
>
>
>
> On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <allenxw...@gmail.com> wrote:
>
> > That's a good question. I can think of three actions if the rack
> > information is incomplete:
> >
> > 1. Treat the node without rack as if it is on its unique rack
> > 2. Disregard all rack information and fallback to current algorithm
> > 3. Fail-fast
> >
> > Now I think about it, one and three make more sense. The reason for
> > fail-fast is that user mistake for not providing the rack may never be
> > found if we tolerate that and the assignment may not be rack aware as the
> > user has expected and this creates debug problems when things fail.
> >
> > What do you think? If not fail-fast, is there anyway we can make the user
> > error standing out?
> >
> >
> > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <g...@confluent.io>
> wrote:
> >
> >> Thanks! Just to clarify, when some brokers have rack assignment and some
> >> don't, do we act like none of them have it? or like those without
> >> assignment are in their own rack?
> >>
> >> The first scenario is good when first setting up rack-awareness, but the
> >> second makes more sense for on-going maintenance (I can totally see
> >> someone
> >> adding a node and forgetting to set the rack property, we don't want
> this
> >> to change behavior for anything except the new node).
> >>
> >> What do you think?
> >>
> >> Gwen
> >>
> >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <allenxw...@gmail.com>
> >> wrote:
> >>
> >> > For scenario 1:
> >> >
> >> > - Add the rack information to broker property file or dynamically set
> >> it in
> >> > the wrapper code to bootstrap Kafka server. You would do that for all
> >> > brokers and restart the brokers one by one.
> >> >
> >> > In this scenario, the complete broker to rack mapping may not be
> >> available
> >> > until every broker is restarted. During that time we fall back to
> >> default
> >> > replica assignment algorithm.
> >> >
> >> > For scenario 2:
> >> >
> >> > - Add the rack information to broker property file or dynamically set
> >> it in
> >> > the wrapper code and start the broker.
> >> >
> >> >
> >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <g...@confluent.io>
> >> wrote:
> >> >
> >> > > Can you clarify the workflow for the following scenarios:
> >> > >
> >> > > 1. I currently have 6 brokers and want to add rack information for
> >> each
> >> > > 2. I'm adding a new broker and I want to specify which rack it
> >> belongs on
> >> > > while adding it.
> >> > >
> >> > > Thanks!
> >> > >
> >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <allenxw...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > We discussed the KIP in the hangout today. The recommendation is
> to
> >> > make
> >> > > > rack as a broker property in ZooKeeper. For users with existing
> rack
> >> > > > information stored somewhere, they would need to retrieve the
> >> > information
> >> > > > at broker start up and dynamically set the rack property, which
> can
> >> be
> >> > > > implemented as a wrapper to bootstrap broker. There will be no
> >> > interface
> >> > > or
> >> > > > pluggable implementation to retrieve the rack information.
> >> > > >
> >> > > > The assumption is that you always need to restart the broker to
> >> make a
> >> > > > change to the rack.
> >> > > >
> >> > > > Once the rack becomes a broker property, it will be possible to
> make
> >> > rack
> >> > > > part of the meta data to help the consumer choose which in sync
> >> replica
> >> > > to
> >> > > > consume from as part of the future consumer enhancement.
> >> > > >
> >> > > > I will update the KIP.
> >> > > >
> >> > > > Thanks,
> >> > > > Allen
> >> > > >
> >> > > >
> >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <allenxw...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > I attended Tuesday's KIP hangout but this KIP was not discussed
> >> due
> >> > to
> >> > > > > time constraint.
> >> > > > >
> >> > > > > However, after hearing discussion of KIP-35, I have the feeling
> >> that
> >> > > > > incompatibility (caused by new broker property) between brokers
> >> with
> >> > > > > different versions  will be solved there. In addition, having
> >> stack
> >> > in
> >> > > > > broker property as meta data may also help consumers in the
> >> future.
> >> > So
> >> > > I
> >> > > > am
> >> > > > > open to adding stack property to broker.
> >> > > > >
> >> > > > > Hopefully we can discuss this in the next KIP hangout.
> >> > > > >
> >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> allenxw...@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > >> Can you send me the information on the next KIP hangout?
> >> > > > >>
> >> > > > >> Currently the broker-rack mapping is not cached. In KafkaApis,
> >> > > > >> RackLocator.getRackInfo() is called each time the mapping is
> >> needed
> >> > > for
> >> > > > >> auto topic creation. This will ensure latest mapping is used at
> >> any
> >> > > > time.
> >> > > > >>
> >> > > > >> The ability to get the complete mapping makes it simple to
> reuse
> >> the
> >> > > > same
> >> > > > >> interface in command line tools.
> >> > > > >>
> >> > > > >>
> >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> >> > > > >> aaurad...@linkedin.com.invalid> wrote:
> >> > > > >>
> >> > > > >>> Perhaps we discuss this during the next KIP hangout?
> >> > > > >>>
> >> > > > >>> I do see that a pluggable rack locator can be useful but I do
> >> see a
> >> > > few
> >> > > > >>> concerns:
> >> > > > >>>
> >> > > > >>> - The RackLocator (as described in the document), implies that
> >> it
> >> > can
> >> > > > >>> discover rack information for any node in the cluster. How
> does
> >> it
> >> > > deal
> >> > > > >>> with rack location changes? For example, if I moved broker id
> >> (1)
> >> > > from
> >> > > > >>> rack
> >> > > > >>> X to Y, I only have to start that broker with a newer rack
> >> config.
> >> > If
> >> > > > >>> RackLocator discovers broker -> rack information at start up
> >> time,
> >> > > any
> >> > > > >>> change to a broker will require bouncing the entire cluster
> >> since
> >> > > > >>> createTopic requests can be sent to any node in the cluster.
> >> > > > >>> For this reason it may be simpler to have each node be aware
> of
> >> its
> >> > > own
> >> > > > >>> rack and persist it in ZK during start up time.
> >> > > > >>>
> >> > > > >>> - A pluggable RackLocator relies on an external service being
> >> > > available
> >> > > > >>> to
> >> > > > >>> serve rack information.
> >> > > > >>>
> >> > > > >>> Out of curiosity, I looked up how a couple of other systems
> deal
> >> > with
> >> > > > >>> zone/rack awareness.
> >> > > > >>> For Cassandra some interesting modes are:
> >> > > > >>> (Property File configuration)
> >> > > > >>>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> >> > > > >>> (Dynamic inference)
> >> > > > >>>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> >> > > > >>>
> >> > > > >>> Voldemort does a static node -> zone assignment based on
> >> > > configuration.
> >> > > > >>>
> >> > > > >>> Aditya
> >> > > > >>>
> >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> >> allenxw...@gmail.com
> >> > >
> >> > > > >>> wrote:
> >> > > > >>>
> >> > > > >>> > I would like to see if we can do both:
> >> > > > >>> >
> >> > > > >>> > - Make RackLocator pluggable to facilitate migration with
> >> > existing
> >> > > > >>> > broker-rack mapping
> >> > > > >>> >
> >> > > > >>> > - Make rack an optional property for broker. If rack is
> >> available
> >> > > > from
> >> > > > >>> > broker, treat it as source of truth. For users with existing
> >> > > > >>> broker-rack
> >> > > > >>> > mapping somewhere else, they can use the pluggable way or
> they
> >> > can
> >> > > > >>> transfer
> >> > > > >>> > the mapping to the broker rack property.
> >> > > > >>> >
> >> > > > >>> > One thing I am not sure is what happens at rolling upgrade
> >> when
> >> > we
> >> > > > have
> >> > > > >>> > rack as a broker property. For brokers with older version of
> >> > Kafka,
> >> > > > >>> will it
> >> > > > >>> > cause problem for them? If so, is there any workaround? I
> also
> >> > > think
> >> > > > it
> >> > > > >>> > would be better not to have rack in the controller wire
> >> protocol
> >> > > but
> >> > > > >>> not
> >> > > > >>> > sure if it is achievable.
> >> > > > >>> >
> >> > > > >>> > Thanks,
> >> > > > >>> > Allen
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
> >> tpal...@gmail.com>
> >> > > > >>> wrote:
> >> > > > >>> >
> >> > > > >>> > > I tend to like the idea of a pluggable locator. For
> >> example, we
> >> > > > >>> already
> >> > > > >>> > > have an interface for discovering information about the
> >> > physical
> >> > > > >>> location
> >> > > > >>> > > of servers. I don't relish the idea of having to maintain
> >> data
> >> > in
> >> > > > >>> > multiple
> >> > > > >>> > > places.
> >> > > > >>> > >
> >> > > > >>> > > -Todd
> >> > > > >>> > >
> >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> >> > > > >>> > > aaurad...@linkedin.com.invalid> wrote:
> >> > > > >>> > >
> >> > > > >>> > > > Thanks for starting this KIP Allen.
> >> > > > >>> > > >
> >> > > > >>> > > > I agree with Gwen that having a RackLocator class that
> is
> >> > > > pluggable
> >> > > > >>> > seems
> >> > > > >>> > > > to be too complex. The KIP refers to potentially non-ZK
> >> > storage
> >> > > > >>> for the
> >> > > > >>> > > > rack info which I don't think is necessary.
> >> > > > >>> > > >
> >> > > > >>> > > > Perhaps we can persist this info in zk under
> >> > > > >>> /brokers/ids/<broker_id>
> >> > > > >>> > > > similar to other broker properties and add a config in
> >> > > > KafkaConfig
> >> > > > >>> > called
> >> > > > >>> > > > "rack".
> >> > > > >>> > > >
> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> >> > > "rack":
> >> > > > >>> > "abc"}
> >> > > > >>> > > >
> >> > > > >>> > > > Aditya
> >> > > > >>> > > >
> >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
> >> > > g...@confluent.io
> >> > > > >
> >> > > > >>> > wrote:
> >> > > > >>> > > >
> >> > > > >>> > > > > Hi,
> >> > > > >>> > > > >
> >> > > > >>> > > > > First, thanks for putting out a KIP for this. This is
> >> super
> >> > > > >>> important
> >> > > > >>> > > for
> >> > > > >>> > > > > production deployments of Kafka.
> >> > > > >>> > > > >
> >> > > > >>> > > > > Few questions:
> >> > > > >>> > > > >
> >> > > > >>> > > > > 1) Are we sure we want "as many racks as possible"?
> I'd
> >> > want
> >> > > to
> >> > > > >>> > balance
> >> > > > >>> > > > > between safety (more racks) and network utilization
> >> > (traffic
> >> > > > >>> within a
> >> > > > >>> > > > rack
> >> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica on a
> >> > > different
> >> > > > >>> rack
> >> > > > >>> > > and
> >> > > > >>> > > > > the rest on same rack (if possible) sounds better to
> me.
> >> > > > >>> > > > >
> >> > > > >>> > > > > 2) Rack-locator class seems overly complex compared to
> >> > > adding a
> >> > > > >>> > > > rack.number
> >> > > > >>> > > > > property to the broker properties file. Why do we want
> >> > that?
> >> > > > >>> > > > >
> >> > > > >>> > > > > Gwen
> >> > > > >>> > > > >
> >> > > > >>> > > > >
> >> > > > >>> > > > >
> >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> >> > > > >>> allenxw...@gmail.com>
> >> > > > >>> > > > wrote:
> >> > > > >>> > > > >
> >> > > > >>> > > > > > Hello Kafka Developers,
> >> > > > >>> > > > > >
> >> > > > >>> > > > > > I just created KIP-36 for rack aware replica
> >> assignment.
> >> > > > >>> > > > > >
> >> > > > >>> > > > > >
> >> > > > >>> > > > > >
> >> > > > >>> > > > >
> >> > > > >>> > > >
> >> > > > >>> > >
> >> > > > >>> >
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >> > > > >>> > > > > >
> >> > > > >>> > > > > > The goal is to utilize the isolation provided by the
> >> > racks
> >> > > in
> >> > > > >>> data
> >> > > > >>> > > > center
> >> > > > >>> > > > > > and distribute replicas to racks to provide fault
> >> > > tolerance.
> >> > > > >>> > > > > >
> >> > > > >>> > > > > > Comments are welcome.
> >> > > > >>> > > > > >
> >> > > > >>> > > > > > Thanks,
> >> > > > >>> > > > > > Allen
> >> > > > >>> > > > > >
> >> > > > >>> > > > >
> >> > > > >>> > > >
> >> > > > >>> > >
> >> > > > >>> >
> >> > > > >>>
> >> > > > >>
> >> > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to