Hey Allen, 1. If we choose fail fast topic creation, we will have topic creation failures while upgrading the cluster. I really doubt we want this behavior. Ideally, this should be invisible to clients of a cluster. Currently, each broker is effectively its own rack. So we probably can use the rack information whenever possible but not make it a hard requirement. To extend Gwen's example, one badly configured broker should not degrade topic creation for the entire cluster.
2. Upgrade scenario - Can you add a section on the upgrade piece to confirm that old clients will not see errors? I believe ZookeeperConsumerConnector reads the Broker objects from ZK. I wanted to confirm that this will not cause any problems. 3. Could you elaborate your proposed changes to the UpdateMetadataRequest in the "Public Interfaces" section? Personally, I find this format easy to read in terms of wire protocol changes: https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest Aditya On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <allenxw...@gmail.com> wrote: > KIP is updated include rack as an optional property for broker. Please take > a look and let me know if more details are needed. > > For the case where some brokers have rack and some do not, the current KIP > uses the fail-fast behavior. If there are concerns, we can further discuss > this in the email thread or next hangout. > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <allenxw...@gmail.com> wrote: > > > That's a good question. I can think of three actions if the rack > > information is incomplete: > > > > 1. Treat the node without rack as if it is on its unique rack > > 2. Disregard all rack information and fallback to current algorithm > > 3. Fail-fast > > > > Now I think about it, one and three make more sense. The reason for > > fail-fast is that user mistake for not providing the rack may never be > > found if we tolerate that and the assignment may not be rack aware as the > > user has expected and this creates debug problems when things fail. > > > > What do you think? If not fail-fast, is there anyway we can make the user > > error standing out? > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <g...@confluent.io> > wrote: > > > >> Thanks! Just to clarify, when some brokers have rack assignment and some > >> don't, do we act like none of them have it? or like those without > >> assignment are in their own rack? > >> > >> The first scenario is good when first setting up rack-awareness, but the > >> second makes more sense for on-going maintenance (I can totally see > >> someone > >> adding a node and forgetting to set the rack property, we don't want > this > >> to change behavior for anything except the new node). > >> > >> What do you think? > >> > >> Gwen > >> > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <allenxw...@gmail.com> > >> wrote: > >> > >> > For scenario 1: > >> > > >> > - Add the rack information to broker property file or dynamically set > >> it in > >> > the wrapper code to bootstrap Kafka server. You would do that for all > >> > brokers and restart the brokers one by one. > >> > > >> > In this scenario, the complete broker to rack mapping may not be > >> available > >> > until every broker is restarted. During that time we fall back to > >> default > >> > replica assignment algorithm. > >> > > >> > For scenario 2: > >> > > >> > - Add the rack information to broker property file or dynamically set > >> it in > >> > the wrapper code and start the broker. > >> > > >> > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <g...@confluent.io> > >> wrote: > >> > > >> > > Can you clarify the workflow for the following scenarios: > >> > > > >> > > 1. I currently have 6 brokers and want to add rack information for > >> each > >> > > 2. I'm adding a new broker and I want to specify which rack it > >> belongs on > >> > > while adding it. > >> > > > >> > > Thanks! > >> > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <allenxw...@gmail.com> > >> > wrote: > >> > > > >> > > > We discussed the KIP in the hangout today. The recommendation is > to > >> > make > >> > > > rack as a broker property in ZooKeeper. For users with existing > rack > >> > > > information stored somewhere, they would need to retrieve the > >> > information > >> > > > at broker start up and dynamically set the rack property, which > can > >> be > >> > > > implemented as a wrapper to bootstrap broker. There will be no > >> > interface > >> > > or > >> > > > pluggable implementation to retrieve the rack information. > >> > > > > >> > > > The assumption is that you always need to restart the broker to > >> make a > >> > > > change to the rack. > >> > > > > >> > > > Once the rack becomes a broker property, it will be possible to > make > >> > rack > >> > > > part of the meta data to help the consumer choose which in sync > >> replica > >> > > to > >> > > > consume from as part of the future consumer enhancement. > >> > > > > >> > > > I will update the KIP. > >> > > > > >> > > > Thanks, > >> > > > Allen > >> > > > > >> > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <allenxw...@gmail.com> > >> > wrote: > >> > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was not discussed > >> due > >> > to > >> > > > > time constraint. > >> > > > > > >> > > > > However, after hearing discussion of KIP-35, I have the feeling > >> that > >> > > > > incompatibility (caused by new broker property) between brokers > >> with > >> > > > > different versions will be solved there. In addition, having > >> stack > >> > in > >> > > > > broker property as meta data may also help consumers in the > >> future. > >> > So > >> > > I > >> > > > am > >> > > > > open to adding stack property to broker. > >> > > > > > >> > > > > Hopefully we can discuss this in the next KIP hangout. > >> > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang < > allenxw...@gmail.com > >> > > >> > > > wrote: > >> > > > > > >> > > > >> Can you send me the information on the next KIP hangout? > >> > > > >> > >> > > > >> Currently the broker-rack mapping is not cached. In KafkaApis, > >> > > > >> RackLocator.getRackInfo() is called each time the mapping is > >> needed > >> > > for > >> > > > >> auto topic creation. This will ensure latest mapping is used at > >> any > >> > > > time. > >> > > > >> > >> > > > >> The ability to get the complete mapping makes it simple to > reuse > >> the > >> > > > same > >> > > > >> interface in command line tools. > >> > > > >> > >> > > > >> > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar < > >> > > > >> aaurad...@linkedin.com.invalid> wrote: > >> > > > >> > >> > > > >>> Perhaps we discuss this during the next KIP hangout? > >> > > > >>> > >> > > > >>> I do see that a pluggable rack locator can be useful but I do > >> see a > >> > > few > >> > > > >>> concerns: > >> > > > >>> > >> > > > >>> - The RackLocator (as described in the document), implies that > >> it > >> > can > >> > > > >>> discover rack information for any node in the cluster. How > does > >> it > >> > > deal > >> > > > >>> with rack location changes? For example, if I moved broker id > >> (1) > >> > > from > >> > > > >>> rack > >> > > > >>> X to Y, I only have to start that broker with a newer rack > >> config. > >> > If > >> > > > >>> RackLocator discovers broker -> rack information at start up > >> time, > >> > > any > >> > > > >>> change to a broker will require bouncing the entire cluster > >> since > >> > > > >>> createTopic requests can be sent to any node in the cluster. > >> > > > >>> For this reason it may be simpler to have each node be aware > of > >> its > >> > > own > >> > > > >>> rack and persist it in ZK during start up time. > >> > > > >>> > >> > > > >>> - A pluggable RackLocator relies on an external service being > >> > > available > >> > > > >>> to > >> > > > >>> serve rack information. > >> > > > >>> > >> > > > >>> Out of curiosity, I looked up how a couple of other systems > deal > >> > with > >> > > > >>> zone/rack awareness. > >> > > > >>> For Cassandra some interesting modes are: > >> > > > >>> (Property File configuration) > >> > > > >>> > >> > > > >>> > >> > > > > >> > > > >> > > >> > http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html > >> > > > >>> (Dynamic inference) > >> > > > >>> > >> > > > >>> > >> > > > > >> > > > >> > > >> > http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html > >> > > > >>> > >> > > > >>> Voldemort does a static node -> zone assignment based on > >> > > configuration. > >> > > > >>> > >> > > > >>> Aditya > >> > > > >>> > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang < > >> allenxw...@gmail.com > >> > > > >> > > > >>> wrote: > >> > > > >>> > >> > > > >>> > I would like to see if we can do both: > >> > > > >>> > > >> > > > >>> > - Make RackLocator pluggable to facilitate migration with > >> > existing > >> > > > >>> > broker-rack mapping > >> > > > >>> > > >> > > > >>> > - Make rack an optional property for broker. If rack is > >> available > >> > > > from > >> > > > >>> > broker, treat it as source of truth. For users with existing > >> > > > >>> broker-rack > >> > > > >>> > mapping somewhere else, they can use the pluggable way or > they > >> > can > >> > > > >>> transfer > >> > > > >>> > the mapping to the broker rack property. > >> > > > >>> > > >> > > > >>> > One thing I am not sure is what happens at rolling upgrade > >> when > >> > we > >> > > > have > >> > > > >>> > rack as a broker property. For brokers with older version of > >> > Kafka, > >> > > > >>> will it > >> > > > >>> > cause problem for them? If so, is there any workaround? I > also > >> > > think > >> > > > it > >> > > > >>> > would be better not to have rack in the controller wire > >> protocol > >> > > but > >> > > > >>> not > >> > > > >>> > sure if it is achievable. > >> > > > >>> > > >> > > > >>> > Thanks, > >> > > > >>> > Allen > >> > > > >>> > > >> > > > >>> > > >> > > > >>> > > >> > > > >>> > > >> > > > >>> > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino < > >> tpal...@gmail.com> > >> > > > >>> wrote: > >> > > > >>> > > >> > > > >>> > > I tend to like the idea of a pluggable locator. For > >> example, we > >> > > > >>> already > >> > > > >>> > > have an interface for discovering information about the > >> > physical > >> > > > >>> location > >> > > > >>> > > of servers. I don't relish the idea of having to maintain > >> data > >> > in > >> > > > >>> > multiple > >> > > > >>> > > places. > >> > > > >>> > > > >> > > > >>> > > -Todd > >> > > > >>> > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar < > >> > > > >>> > > aaurad...@linkedin.com.invalid> wrote: > >> > > > >>> > > > >> > > > >>> > > > Thanks for starting this KIP Allen. > >> > > > >>> > > > > >> > > > >>> > > > I agree with Gwen that having a RackLocator class that > is > >> > > > pluggable > >> > > > >>> > seems > >> > > > >>> > > > to be too complex. The KIP refers to potentially non-ZK > >> > storage > >> > > > >>> for the > >> > > > >>> > > > rack info which I don't think is necessary. > >> > > > >>> > > > > >> > > > >>> > > > Perhaps we can persist this info in zk under > >> > > > >>> /brokers/ids/<broker_id> > >> > > > >>> > > > similar to other broker properties and add a config in > >> > > > KafkaConfig > >> > > > >>> > called > >> > > > >>> > > > "rack". > >> > > > >>> > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, > >> > > "rack": > >> > > > >>> > "abc"} > >> > > > >>> > > > > >> > > > >>> > > > Aditya > >> > > > >>> > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira < > >> > > g...@confluent.io > >> > > > > > >> > > > >>> > wrote: > >> > > > >>> > > > > >> > > > >>> > > > > Hi, > >> > > > >>> > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for this. This is > >> super > >> > > > >>> important > >> > > > >>> > > for > >> > > > >>> > > > > production deployments of Kafka. > >> > > > >>> > > > > > >> > > > >>> > > > > Few questions: > >> > > > >>> > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as possible"? > I'd > >> > want > >> > > to > >> > > > >>> > balance > >> > > > >>> > > > > between safety (more racks) and network utilization > >> > (traffic > >> > > > >>> within a > >> > > > >>> > > > rack > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica on a > >> > > different > >> > > > >>> rack > >> > > > >>> > > and > >> > > > >>> > > > > the rest on same rack (if possible) sounds better to > me. > >> > > > >>> > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly complex compared to > >> > > adding a > >> > > > >>> > > > rack.number > >> > > > >>> > > > > property to the broker properties file. Why do we want > >> > that? > >> > > > >>> > > > > > >> > > > >>> > > > > Gwen > >> > > > >>> > > > > > >> > > > >>> > > > > > >> > > > >>> > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang < > >> > > > >>> allenxw...@gmail.com> > >> > > > >>> > > > wrote: > >> > > > >>> > > > > > >> > > > >>> > > > > > Hello Kafka Developers, > >> > > > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware replica > >> assignment. > >> > > > >>> > > > > > > >> > > > >>> > > > > > > >> > > > >>> > > > > > > >> > > > >>> > > > > > >> > > > >>> > > > > >> > > > >>> > > > >> > > > >>> > > >> > > > >>> > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment > >> > > > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation provided by the > >> > racks > >> > > in > >> > > > >>> data > >> > > > >>> > > > center > >> > > > >>> > > > > > and distribute replicas to racks to provide fault > >> > > tolerance. > >> > > > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome. > >> > > > >>> > > > > > > >> > > > >>> > > > > > Thanks, > >> > > > >>> > > > > > Allen > >> > > > >>> > > > > > > >> > > > >>> > > > > > >> > > > >>> > > > > >> > > > >>> > > > >> > > > >>> > > >> > > > >>> > >> > > > >> > >> > > > >> > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >