KIP is updated include rack as an optional property for broker. Please take a look and let me know if more details are needed.
For the case where some brokers have rack and some do not, the current KIP uses the fail-fast behavior. If there are concerns, we can further discuss this in the email thread or next hangout. On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <allenxw...@gmail.com> wrote: > That's a good question. I can think of three actions if the rack > information is incomplete: > > 1. Treat the node without rack as if it is on its unique rack > 2. Disregard all rack information and fallback to current algorithm > 3. Fail-fast > > Now I think about it, one and three make more sense. The reason for > fail-fast is that user mistake for not providing the rack may never be > found if we tolerate that and the assignment may not be rack aware as the > user has expected and this creates debug problems when things fail. > > What do you think? If not fail-fast, is there anyway we can make the user > error standing out? > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <g...@confluent.io> wrote: > >> Thanks! Just to clarify, when some brokers have rack assignment and some >> don't, do we act like none of them have it? or like those without >> assignment are in their own rack? >> >> The first scenario is good when first setting up rack-awareness, but the >> second makes more sense for on-going maintenance (I can totally see >> someone >> adding a node and forgetting to set the rack property, we don't want this >> to change behavior for anything except the new node). >> >> What do you think? >> >> Gwen >> >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <allenxw...@gmail.com> >> wrote: >> >> > For scenario 1: >> > >> > - Add the rack information to broker property file or dynamically set >> it in >> > the wrapper code to bootstrap Kafka server. You would do that for all >> > brokers and restart the brokers one by one. >> > >> > In this scenario, the complete broker to rack mapping may not be >> available >> > until every broker is restarted. During that time we fall back to >> default >> > replica assignment algorithm. >> > >> > For scenario 2: >> > >> > - Add the rack information to broker property file or dynamically set >> it in >> > the wrapper code and start the broker. >> > >> > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <g...@confluent.io> >> wrote: >> > >> > > Can you clarify the workflow for the following scenarios: >> > > >> > > 1. I currently have 6 brokers and want to add rack information for >> each >> > > 2. I'm adding a new broker and I want to specify which rack it >> belongs on >> > > while adding it. >> > > >> > > Thanks! >> > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <allenxw...@gmail.com> >> > wrote: >> > > >> > > > We discussed the KIP in the hangout today. The recommendation is to >> > make >> > > > rack as a broker property in ZooKeeper. For users with existing rack >> > > > information stored somewhere, they would need to retrieve the >> > information >> > > > at broker start up and dynamically set the rack property, which can >> be >> > > > implemented as a wrapper to bootstrap broker. There will be no >> > interface >> > > or >> > > > pluggable implementation to retrieve the rack information. >> > > > >> > > > The assumption is that you always need to restart the broker to >> make a >> > > > change to the rack. >> > > > >> > > > Once the rack becomes a broker property, it will be possible to make >> > rack >> > > > part of the meta data to help the consumer choose which in sync >> replica >> > > to >> > > > consume from as part of the future consumer enhancement. >> > > > >> > > > I will update the KIP. >> > > > >> > > > Thanks, >> > > > Allen >> > > > >> > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <allenxw...@gmail.com> >> > wrote: >> > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was not discussed >> due >> > to >> > > > > time constraint. >> > > > > >> > > > > However, after hearing discussion of KIP-35, I have the feeling >> that >> > > > > incompatibility (caused by new broker property) between brokers >> with >> > > > > different versions will be solved there. In addition, having >> stack >> > in >> > > > > broker property as meta data may also help consumers in the >> future. >> > So >> > > I >> > > > am >> > > > > open to adding stack property to broker. >> > > > > >> > > > > Hopefully we can discuss this in the next KIP hangout. >> > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <allenxw...@gmail.com >> > >> > > > wrote: >> > > > > >> > > > >> Can you send me the information on the next KIP hangout? >> > > > >> >> > > > >> Currently the broker-rack mapping is not cached. In KafkaApis, >> > > > >> RackLocator.getRackInfo() is called each time the mapping is >> needed >> > > for >> > > > >> auto topic creation. This will ensure latest mapping is used at >> any >> > > > time. >> > > > >> >> > > > >> The ability to get the complete mapping makes it simple to reuse >> the >> > > > same >> > > > >> interface in command line tools. >> > > > >> >> > > > >> >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar < >> > > > >> aaurad...@linkedin.com.invalid> wrote: >> > > > >> >> > > > >>> Perhaps we discuss this during the next KIP hangout? >> > > > >>> >> > > > >>> I do see that a pluggable rack locator can be useful but I do >> see a >> > > few >> > > > >>> concerns: >> > > > >>> >> > > > >>> - The RackLocator (as described in the document), implies that >> it >> > can >> > > > >>> discover rack information for any node in the cluster. How does >> it >> > > deal >> > > > >>> with rack location changes? For example, if I moved broker id >> (1) >> > > from >> > > > >>> rack >> > > > >>> X to Y, I only have to start that broker with a newer rack >> config. >> > If >> > > > >>> RackLocator discovers broker -> rack information at start up >> time, >> > > any >> > > > >>> change to a broker will require bouncing the entire cluster >> since >> > > > >>> createTopic requests can be sent to any node in the cluster. >> > > > >>> For this reason it may be simpler to have each node be aware of >> its >> > > own >> > > > >>> rack and persist it in ZK during start up time. >> > > > >>> >> > > > >>> - A pluggable RackLocator relies on an external service being >> > > available >> > > > >>> to >> > > > >>> serve rack information. >> > > > >>> >> > > > >>> Out of curiosity, I looked up how a couple of other systems deal >> > with >> > > > >>> zone/rack awareness. >> > > > >>> For Cassandra some interesting modes are: >> > > > >>> (Property File configuration) >> > > > >>> >> > > > >>> >> > > > >> > > >> > >> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html >> > > > >>> (Dynamic inference) >> > > > >>> >> > > > >>> >> > > > >> > > >> > >> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html >> > > > >>> >> > > > >>> Voldemort does a static node -> zone assignment based on >> > > configuration. >> > > > >>> >> > > > >>> Aditya >> > > > >>> >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang < >> allenxw...@gmail.com >> > > >> > > > >>> wrote: >> > > > >>> >> > > > >>> > I would like to see if we can do both: >> > > > >>> > >> > > > >>> > - Make RackLocator pluggable to facilitate migration with >> > existing >> > > > >>> > broker-rack mapping >> > > > >>> > >> > > > >>> > - Make rack an optional property for broker. If rack is >> available >> > > > from >> > > > >>> > broker, treat it as source of truth. For users with existing >> > > > >>> broker-rack >> > > > >>> > mapping somewhere else, they can use the pluggable way or they >> > can >> > > > >>> transfer >> > > > >>> > the mapping to the broker rack property. >> > > > >>> > >> > > > >>> > One thing I am not sure is what happens at rolling upgrade >> when >> > we >> > > > have >> > > > >>> > rack as a broker property. For brokers with older version of >> > Kafka, >> > > > >>> will it >> > > > >>> > cause problem for them? If so, is there any workaround? I also >> > > think >> > > > it >> > > > >>> > would be better not to have rack in the controller wire >> protocol >> > > but >> > > > >>> not >> > > > >>> > sure if it is achievable. >> > > > >>> > >> > > > >>> > Thanks, >> > > > >>> > Allen >> > > > >>> > >> > > > >>> > >> > > > >>> > >> > > > >>> > >> > > > >>> > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino < >> tpal...@gmail.com> >> > > > >>> wrote: >> > > > >>> > >> > > > >>> > > I tend to like the idea of a pluggable locator. For >> example, we >> > > > >>> already >> > > > >>> > > have an interface for discovering information about the >> > physical >> > > > >>> location >> > > > >>> > > of servers. I don't relish the idea of having to maintain >> data >> > in >> > > > >>> > multiple >> > > > >>> > > places. >> > > > >>> > > >> > > > >>> > > -Todd >> > > > >>> > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar < >> > > > >>> > > aaurad...@linkedin.com.invalid> wrote: >> > > > >>> > > >> > > > >>> > > > Thanks for starting this KIP Allen. >> > > > >>> > > > >> > > > >>> > > > I agree with Gwen that having a RackLocator class that is >> > > > pluggable >> > > > >>> > seems >> > > > >>> > > > to be too complex. The KIP refers to potentially non-ZK >> > storage >> > > > >>> for the >> > > > >>> > > > rack info which I don't think is necessary. >> > > > >>> > > > >> > > > >>> > > > Perhaps we can persist this info in zk under >> > > > >>> /brokers/ids/<broker_id> >> > > > >>> > > > similar to other broker properties and add a config in >> > > > KafkaConfig >> > > > >>> > called >> > > > >>> > > > "rack". >> > > > >>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, >> > > "rack": >> > > > >>> > "abc"} >> > > > >>> > > > >> > > > >>> > > > Aditya >> > > > >>> > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira < >> > > g...@confluent.io >> > > > > >> > > > >>> > wrote: >> > > > >>> > > > >> > > > >>> > > > > Hi, >> > > > >>> > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for this. This is >> super >> > > > >>> important >> > > > >>> > > for >> > > > >>> > > > > production deployments of Kafka. >> > > > >>> > > > > >> > > > >>> > > > > Few questions: >> > > > >>> > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as possible"? I'd >> > want >> > > to >> > > > >>> > balance >> > > > >>> > > > > between safety (more racks) and network utilization >> > (traffic >> > > > >>> within a >> > > > >>> > > > rack >> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica on a >> > > different >> > > > >>> rack >> > > > >>> > > and >> > > > >>> > > > > the rest on same rack (if possible) sounds better to me. >> > > > >>> > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly complex compared to >> > > adding a >> > > > >>> > > > rack.number >> > > > >>> > > > > property to the broker properties file. Why do we want >> > that? >> > > > >>> > > > > >> > > > >>> > > > > Gwen >> > > > >>> > > > > >> > > > >>> > > > > >> > > > >>> > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang < >> > > > >>> allenxw...@gmail.com> >> > > > >>> > > > wrote: >> > > > >>> > > > > >> > > > >>> > > > > > Hello Kafka Developers, >> > > > >>> > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware replica >> assignment. >> > > > >>> > > > > > >> > > > >>> > > > > > >> > > > >>> > > > > > >> > > > >>> > > > > >> > > > >>> > > > >> > > > >>> > > >> > > > >>> > >> > > > >>> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment >> > > > >>> > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation provided by the >> > racks >> > > in >> > > > >>> data >> > > > >>> > > > center >> > > > >>> > > > > > and distribute replicas to racks to provide fault >> > > tolerance. >> > > > >>> > > > > > >> > > > >>> > > > > > Comments are welcome. >> > > > >>> > > > > > >> > > > >>> > > > > > Thanks, >> > > > >>> > > > > > Allen >> > > > >>> > > > > > >> > > > >>> > > > > >> > > > >>> > > > >> > > > >>> > > >> > > > >>> > >> > > > >>> >> > > > >> >> > > > >> >> > > > > >> > > > >> > > >> > >> > >