If there are no more comments I would like to call for a vote.
On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <allenxw...@gmail.com> wrote: > KIP is updated with more details and how to handle the situation where > rack information is incomplete. > > In the situation where rack information is incomplete, but we want to > continue with the assignment, I have suggested to ignore all rack > information and fallback to original algorithm. The reason is explained > below: > > The other options are to assume that the broker without the rack belong to > its own unique rack, or they belong to one "default" rack. Either way we > choose, it is highly likely to result in uneven number of brokers in racks, > and it is quite possible that the "made up" racks will have much fewer > number of brokers. As I explained in the KIP, uneven number of brokers in > racks will lead to uneven distribution of replicas among brokers (even > though the leader distribution is still even). The brokers in the rack that > has fewer number of brokers will get more replicas per broker than brokers > in other racks. > > Given this fact and the replica assignment produced will be incorrect > anyway from rack aware point of view, ignoring all rack information and > fallback to the original algorithm is not a bad choice since it will at > least have a better guarantee of replica distribution. > > Also for command line tools it gives user a choice if for any reason they > want to ignore rack information and fallback to the original algorithm. > > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <allenxw...@gmail.com> wrote: > >> I am busy with some time pressing issues for the last few days. I will >> think about how the incomplete rack information will affect the balance and >> update the KIP by early next week. >> >> Thanks, >> Allen >> >> >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <n...@confluent.io> wrote: >> >>> Few suggestions on improving the KIP >>> >>> *If some brokers have rack, and some do not, the algorithm will thrown an >>> > exception. This is to prevent incorrect assignment caused by user >>> error.* >>> >>> >>> In the KIP, can you clearly state the user-facing behavior when some >>> brokers have rack information and some don't. Which actions and requests >>> will error out and how? >>> >>> *Even distribution of partition leadership among brokers* >>> >>> >>> There is some information about arranging the sorted broker list >>> interlaced >>> with rack ids. Can you describe the changes to the current algorithm in a >>> little more detail? How does this interlacing work if only a subset of >>> brokers have the rack id configured? Does this still work if uneven # of >>> brokers are assigned to each rack? It might work, I'm looking for more >>> details on the changes, since it will affect the behavior seen by the >>> user >>> - imbalance on either the leaders or data or both. >>> >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <aaurad...@linkedin.com> >>> wrote: >>> >>> > I think this sounds reasonable. Anyone else have comments? >>> > >>> > Aditya >>> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <allenxw...@gmail.com> >>> wrote: >>> > >>> > > During the discussion in the hangout, it was mentioned that it would >>> be >>> > > desirable that consumers know the rack information of the brokers so >>> that >>> > > they can consume from the broker in the same rack to reduce latency. >>> As I >>> > > understand this will only be beneficial if consumer can consume from >>> any >>> > > broker in ISR, which is not possible now. >>> > > >>> > > I suggest we skip the change to TMR. Once the change is made to >>> consumer >>> > to >>> > > be able to consume from any broker in ISR, the rack information can >>> be >>> > > added to TMR. >>> > > >>> > > Another thing I want to confirm is command line behavior. I think >>> the >>> > > desirable default behavior is to fail fast on command line for >>> incomplete >>> > > rack mapping. The error message can include further instruction that >>> > tells >>> > > the user to add an extra argument (like "--allow-partial-rackinfo") >>> to >>> > > suppress the error and do an imperfect rack aware assignment. If the >>> > > default behavior is to allow incomplete mapping, the error can still >>> be >>> > > easily missed. >>> > > >>> > > The affected command line tools are TopicCommand and >>> > > ReassignPartitionsCommand. >>> > > >>> > > Thanks, >>> > > Allen >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar < >>> > aaurad...@linkedin.com> >>> > > wrote: >>> > > >>> > > > Hi Allen, >>> > > > >>> > > > For TopicMetadataResponse to understand version, you can bump up >>> the >>> > > > request version itself. Based on the version of the request, the >>> > response >>> > > > can be appropriately serialized. It shouldn't be a huge change. For >>> > > > example: We went through something similar for ProduceRequest >>> recently >>> > ( >>> > > > https://reviews.apache.org/r/33378/) >>> > > > I guess the reason protocol information is not included in the TMR >>> is >>> > > > because the topic itself is independent of any particular protocol >>> (SSL >>> > > vs >>> > > > Plaintext). Having said that, I'm not sure we even need rack >>> > information >>> > > in >>> > > > TMR. What usecase were you thinking of initially? >>> > > > >>> > > > For 1 - I'd be fine with adding an option to the command line tools >>> > that >>> > > > check rack assignment. For e.g. "--strict-assignment" or something >>> > > similar. >>> > > > >>> > > > Aditya >>> > > > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <allenxw...@gmail.com> >>> > > wrote: >>> > > > >>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One >>> thing I >>> > > have >>> > > > > changed is removing the proposal to add rack to >>> > TopicMetadataResponse. >>> > > > The >>> > > > > reason is that unlike UpdateMetadataRequest, >>> TopicMetadataResponse >>> > does >>> > > > not >>> > > > > understand version. I don't see a way to include rack without >>> > breaking >>> > > > old >>> > > > > version of clients. That's probably why secure protocol is not >>> > included >>> > > > in >>> > > > > the TopicMetadataResponse either. I think it will be a much >>> bigger >>> > > change >>> > > > > to include rack in TopicMetadataResponse. >>> > > > > >>> > > > > For 1, my concern is that doing rack aware assignment without >>> > complete >>> > > > > broker to rack mapping will result in assignment that is not rack >>> > aware >>> > > > and >>> > > > > fail to provide fault tolerance in the event of rack outage. This >>> > kind >>> > > of >>> > > > > problem will be difficult to surface. And the cost of this >>> problem is >>> > > > high: >>> > > > > you have to do partition reassignment if you are lucky to spot >>> the >>> > > > problem >>> > > > > early on or face the consequence of data loss during real rack >>> > outage. >>> > > > > >>> > > > > I do see the concern of fail-fast as it might also cause data >>> loss if >>> > > > > producer is not able produce the message due to topic creation >>> > failure. >>> > > > Is >>> > > > > it feasible to treat dynamic topic creation and command tools >>> > > > differently? >>> > > > > We allow dynamic topic creation with incomplete broker-rack >>> mapping >>> > and >>> > > > > fail fast in command line. Another option is to let user >>> determine >>> > the >>> > > > > behavior for command line. For example, by default fail fast in >>> > command >>> > > > > line but allow incomplete broker-rack mapping if another switch >>> is >>> > > > > provided. >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar < >>> > > > > aaurad...@linkedin.com.invalid> wrote: >>> > > > > >>> > > > > > Hey Allen, >>> > > > > > >>> > > > > > 1. If we choose fail fast topic creation, we will have topic >>> > creation >>> > > > > > failures while upgrading the cluster. I really doubt we want >>> this >>> > > > > behavior. >>> > > > > > Ideally, this should be invisible to clients of a cluster. >>> > Currently, >>> > > > > each >>> > > > > > broker is effectively its own rack. So we probably can use the >>> rack >>> > > > > > information whenever possible but not make it a hard >>> requirement. >>> > To >>> > > > > extend >>> > > > > > Gwen's example, one badly configured broker should not degrade >>> > topic >>> > > > > > creation for the entire cluster. >>> > > > > > >>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade >>> piece to >>> > > > > confirm >>> > > > > > that old clients will not see errors? I believe >>> > > > > ZookeeperConsumerConnector >>> > > > > > reads the Broker objects from ZK. I wanted to confirm that this >>> > will >>> > > > not >>> > > > > > cause any problems. >>> > > > > > >>> > > > > > 3. Could you elaborate your proposed changes to the >>> > > > UpdateMetadataRequest >>> > > > > > in the "Public Interfaces" section? Personally, I find this >>> format >>> > > easy >>> > > > > to >>> > > > > > read in terms of wire protocol changes: >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest >>> > > > > > >>> > > > > > Aditya >>> > > > > > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang < >>> allenxw...@gmail.com> >>> > > > > wrote: >>> > > > > > >>> > > > > > > KIP is updated include rack as an optional property for >>> broker. >>> > > > Please >>> > > > > > take >>> > > > > > > a look and let me know if more details are needed. >>> > > > > > > >>> > > > > > > For the case where some brokers have rack and some do not, >>> the >>> > > > current >>> > > > > > KIP >>> > > > > > > uses the fail-fast behavior. If there are concerns, we can >>> > further >>> > > > > > discuss >>> > > > > > > this in the email thread or next hangout. >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang < >>> > allenxw...@gmail.com >>> > > > >>> > > > > > wrote: >>> > > > > > > >>> > > > > > > > That's a good question. I can think of three actions if the >>> > rack >>> > > > > > > > information is incomplete: >>> > > > > > > > >>> > > > > > > > 1. Treat the node without rack as if it is on its unique >>> rack >>> > > > > > > > 2. Disregard all rack information and fallback to current >>> > > algorithm >>> > > > > > > > 3. Fail-fast >>> > > > > > > > >>> > > > > > > > Now I think about it, one and three make more sense. The >>> reason >>> > > for >>> > > > > > > > fail-fast is that user mistake for not providing the rack >>> may >>> > > never >>> > > > > be >>> > > > > > > > found if we tolerate that and the assignment may not be >>> rack >>> > > aware >>> > > > as >>> > > > > > the >>> > > > > > > > user has expected and this creates debug problems when >>> things >>> > > fail. >>> > > > > > > > >>> > > > > > > > What do you think? If not fail-fast, is there anyway we can >>> > make >>> > > > the >>> > > > > > user >>> > > > > > > > error standing out? >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira < >>> > > g...@confluent.io> >>> > > > > > > wrote: >>> > > > > > > > >>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack >>> > assignment >>> > > > and >>> > > > > > some >>> > > > > > > >> don't, do we act like none of them have it? or like those >>> > > without >>> > > > > > > >> assignment are in their own rack? >>> > > > > > > >> >>> > > > > > > >> The first scenario is good when first setting up >>> > rack-awareness, >>> > > > but >>> > > > > > the >>> > > > > > > >> second makes more sense for on-going maintenance (I can >>> > totally >>> > > > see >>> > > > > > > >> someone >>> > > > > > > >> adding a node and forgetting to set the rack property, we >>> > don't >>> > > > want >>> > > > > > > this >>> > > > > > > >> to change behavior for anything except the new node). >>> > > > > > > >> >>> > > > > > > >> What do you think? >>> > > > > > > >> >>> > > > > > > >> Gwen >>> > > > > > > >> >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang < >>> > > > allenxw...@gmail.com> >>> > > > > > > >> wrote: >>> > > > > > > >> >>> > > > > > > >> > For scenario 1: >>> > > > > > > >> > >>> > > > > > > >> > - Add the rack information to broker property file or >>> > > > dynamically >>> > > > > > set >>> > > > > > > >> it in >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You would do >>> > that >>> > > > for >>> > > > > > all >>> > > > > > > >> > brokers and restart the brokers one by one. >>> > > > > > > >> > >>> > > > > > > >> > In this scenario, the complete broker to rack mapping >>> may >>> > not >>> > > be >>> > > > > > > >> available >>> > > > > > > >> > until every broker is restarted. During that time we >>> fall >>> > back >>> > > > to >>> > > > > > > >> default >>> > > > > > > >> > replica assignment algorithm. >>> > > > > > > >> > >>> > > > > > > >> > For scenario 2: >>> > > > > > > >> > >>> > > > > > > >> > - Add the rack information to broker property file or >>> > > > dynamically >>> > > > > > set >>> > > > > > > >> it in >>> > > > > > > >> > the wrapper code and start the broker. >>> > > > > > > >> > >>> > > > > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira < >>> > > > g...@confluent.io> >>> > > > > > > >> wrote: >>> > > > > > > >> > >>> > > > > > > >> > > Can you clarify the workflow for the following >>> scenarios: >>> > > > > > > >> > > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add rack >>> > > information >>> > > > > for >>> > > > > > > >> each >>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify which >>> > rack >>> > > it >>> > > > > > > >> belongs on >>> > > > > > > >> > > while adding it. >>> > > > > > > >> > > >>> > > > > > > >> > > Thanks! >>> > > > > > > >> > > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang < >>> > > > > allenxw...@gmail.com >>> > > > > > > >>> > > > > > > >> > wrote: >>> > > > > > > >> > > >>> > > > > > > >> > > > We discussed the KIP in the hangout today. The >>> > > > recommendation >>> > > > > is >>> > > > > > > to >>> > > > > > > >> > make >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For users >>> with >>> > > > > existing >>> > > > > > > rack >>> > > > > > > >> > > > information stored somewhere, they would need to >>> > retrieve >>> > > > the >>> > > > > > > >> > information >>> > > > > > > >> > > > at broker start up and dynamically set the rack >>> > property, >>> > > > > which >>> > > > > > > can >>> > > > > > > >> be >>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker. There >>> will >>> > > be >>> > > > no >>> > > > > > > >> > interface >>> > > > > > > >> > > or >>> > > > > > > >> > > > pluggable implementation to retrieve the rack >>> > information. >>> > > > > > > >> > > > >>> > > > > > > >> > > > The assumption is that you always need to restart >>> the >>> > > broker >>> > > > > to >>> > > > > > > >> make a >>> > > > > > > >> > > > change to the rack. >>> > > > > > > >> > > > >>> > > > > > > >> > > > Once the rack becomes a broker property, it will be >>> > > possible >>> > > > > to >>> > > > > > > make >>> > > > > > > >> > rack >>> > > > > > > >> > > > part of the meta data to help the consumer choose >>> which >>> > in >>> > > > > sync >>> > > > > > > >> replica >>> > > > > > > >> > > to >>> > > > > > > >> > > > consume from as part of the future consumer >>> enhancement. >>> > > > > > > >> > > > >>> > > > > > > >> > > > I will update the KIP. >>> > > > > > > >> > > > >>> > > > > > > >> > > > Thanks, >>> > > > > > > >> > > > Allen >>> > > > > > > >> > > > >>> > > > > > > >> > > > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang < >>> > > > > > allenxw...@gmail.com> >>> > > > > > > >> > wrote: >>> > > > > > > >> > > > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was >>> not >>> > > > > > discussed >>> > > > > > > >> due >>> > > > > > > >> > to >>> > > > > > > >> > > > > time constraint. >>> > > > > > > >> > > > > >>> > > > > > > >> > > > > However, after hearing discussion of KIP-35, I >>> have >>> > the >>> > > > > > feeling >>> > > > > > > >> that >>> > > > > > > >> > > > > incompatibility (caused by new broker property) >>> > between >>> > > > > > brokers >>> > > > > > > >> with >>> > > > > > > >> > > > > different versions will be solved there. In >>> addition, >>> > > > > having >>> > > > > > > >> stack >>> > > > > > > >> > in >>> > > > > > > >> > > > > broker property as meta data may also help >>> consumers >>> > in >>> > > > the >>> > > > > > > >> future. >>> > > > > > > >> > So >>> > > > > > > >> > > I >>> > > > > > > >> > > > am >>> > > > > > > >> > > > > open to adding stack property to broker. >>> > > > > > > >> > > > > >>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP >>> hangout. >>> > > > > > > >> > > > > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang < >>> > > > > > > allenxw...@gmail.com >>> > > > > > > >> > >>> > > > > > > >> > > > wrote: >>> > > > > > > >> > > > > >>> > > > > > > >> > > > >> Can you send me the information on the next KIP >>> > > hangout? >>> > > > > > > >> > > > >> >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not cached. >>> In >>> > > > > > KafkaApis, >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each time the >>> > > mapping >>> > > > > is >>> > > > > > > >> needed >>> > > > > > > >> > > for >>> > > > > > > >> > > > >> auto topic creation. This will ensure latest >>> mapping >>> > is >>> > > > > used >>> > > > > > at >>> > > > > > > >> any >>> > > > > > > >> > > > time. >>> > > > > > > >> > > > >> >>> > > > > > > >> > > > >> The ability to get the complete mapping makes it >>> > simple >>> > > > to >>> > > > > > > reuse >>> > > > > > > >> the >>> > > > > > > >> > > > same >>> > > > > > > >> > > > >> interface in command line tools. >>> > > > > > > >> > > > >> >>> > > > > > > >> > > > >> >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya >>> Auradkar < >>> > > > > > > >> > > > >> aaurad...@linkedin.com.invalid> wrote: >>> > > > > > > >> > > > >> >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP >>> hangout? >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can be >>> useful >>> > > > but I >>> > > > > > do >>> > > > > > > >> see a >>> > > > > > > >> > > few >>> > > > > > > >> > > > >>> concerns: >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> - The RackLocator (as described in the >>> document), >>> > > > implies >>> > > > > > that >>> > > > > > > >> it >>> > > > > > > >> > can >>> > > > > > > >> > > > >>> discover rack information for any node in the >>> > cluster. >>> > > > How >>> > > > > > > does >>> > > > > > > >> it >>> > > > > > > >> > > deal >>> > > > > > > >> > > > >>> with rack location changes? For example, if I >>> moved >>> > > > broker >>> > > > > > id >>> > > > > > > >> (1) >>> > > > > > > >> > > from >>> > > > > > > >> > > > >>> rack >>> > > > > > > >> > > > >>> X to Y, I only have to start that broker with a >>> > newer >>> > > > rack >>> > > > > > > >> config. >>> > > > > > > >> > If >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack >>> information at >>> > > > start >>> > > > > up >>> > > > > > > >> time, >>> > > > > > > >> > > any >>> > > > > > > >> > > > >>> change to a broker will require bouncing the >>> entire >>> > > > > cluster >>> > > > > > > >> since >>> > > > > > > >> > > > >>> createTopic requests can be sent to any node in >>> the >>> > > > > cluster. >>> > > > > > > >> > > > >>> For this reason it may be simpler to have each >>> node >>> > be >>> > > > > aware >>> > > > > > > of >>> > > > > > > >> its >>> > > > > > > >> > > own >>> > > > > > > >> > > > >>> rack and persist it in ZK during start up time. >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an external >>> > > service >>> > > > > > being >>> > > > > > > >> > > available >>> > > > > > > >> > > > >>> to >>> > > > > > > >> > > > >>> serve rack information. >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple of >>> other >>> > > > > systems >>> > > > > > > deal >>> > > > > > > >> > with >>> > > > > > > >> > > > >>> zone/rack awareness. >>> > > > > > > >> > > > >>> For Cassandra some interesting modes are: >>> > > > > > > >> > > > >>> (Property File configuration) >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> > > > > > > >> > > >>> > > > > > > >> > >>> > > > > > > >> >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html >>> > > > > > > >> > > > >>> (Dynamic inference) >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> > > > > > > >> > > >>> > > > > > > >> > >>> > > > > > > >> >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone assignment >>> > based >>> > > on >>> > > > > > > >> > > configuration. >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> Aditya >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang < >>> > > > > > > >> allenxw...@gmail.com >>> > > > > > > >> > > >>> > > > > > > >> > > > >>> wrote: >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> > I would like to see if we can do both: >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate >>> > migration >>> > > > > with >>> > > > > > > >> > existing >>> > > > > > > >> > > > >>> > broker-rack mapping >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > - Make rack an optional property for broker. >>> If >>> > rack >>> > > > is >>> > > > > > > >> available >>> > > > > > > >> > > > from >>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For users >>> > with >>> > > > > > existing >>> > > > > > > >> > > > >>> broker-rack >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the >>> pluggable >>> > > way >>> > > > > or >>> > > > > > > they >>> > > > > > > >> > can >>> > > > > > > >> > > > >>> transfer >>> > > > > > > >> > > > >>> > the mapping to the broker rack property. >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what happens at >>> rolling >>> > > > > upgrade >>> > > > > > > >> when >>> > > > > > > >> > we >>> > > > > > > >> > > > have >>> > > > > > > >> > > > >>> > rack as a broker property. For brokers with >>> older >>> > > > > version >>> > > > > > of >>> > > > > > > >> > Kafka, >>> > > > > > > >> > > > >>> will it >>> > > > > > > >> > > > >>> > cause problem for them? If so, is there any >>> > > > workaround? >>> > > > > I >>> > > > > > > also >>> > > > > > > >> > > think >>> > > > > > > >> > > > it >>> > > > > > > >> > > > >>> > would be better not to have rack in the >>> controller >>> > > > wire >>> > > > > > > >> protocol >>> > > > > > > >> > > but >>> > > > > > > >> > > > >>> not >>> > > > > > > >> > > > >>> > sure if it is achievable. >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > Thanks, >>> > > > > > > >> > > > >>> > Allen >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino < >>> > > > > > > >> tpal...@gmail.com> >>> > > > > > > >> > > > >>> wrote: >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable >>> locator. >>> > > For >>> > > > > > > >> example, we >>> > > > > > > >> > > > >>> already >>> > > > > > > >> > > > >>> > > have an interface for discovering >>> information >>> > > about >>> > > > > the >>> > > > > > > >> > physical >>> > > > > > > >> > > > >>> location >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of >>> having to >>> > > > > > maintain >>> > > > > > > >> data >>> > > > > > > >> > in >>> > > > > > > >> > > > >>> > multiple >>> > > > > > > >> > > > >>> > > places. >>> > > > > > > >> > > > >>> > > >>> > > > > > > >> > > > >>> > > -Todd >>> > > > > > > >> > > > >>> > > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya >>> > Auradkar < >>> > > > > > > >> > > > >>> > > aaurad...@linkedin.com.invalid> wrote: >>> > > > > > > >> > > > >>> > > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen. >>> > > > > > > >> > > > >>> > > > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a >>> RackLocator >>> > > class >>> > > > > that >>> > > > > > > is >>> > > > > > > >> > > > pluggable >>> > > > > > > >> > > > >>> > seems >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to >>> > potentially >>> > > > > > non-ZK >>> > > > > > > >> > storage >>> > > > > > > >> > > > >>> for the >>> > > > > > > >> > > > >>> > > > rack info which I don't think is >>> necessary. >>> > > > > > > >> > > > >>> > > > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk >>> under >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id> >>> > > > > > > >> > > > >>> > > > similar to other broker properties and >>> add a >>> > > > config >>> > > > > in >>> > > > > > > >> > > > KafkaConfig >>> > > > > > > >> > > > >>> > called >>> > > > > > > >> > > > >>> > > > "rack". >>> > > > > > > >> > > > >>> > > > >>> > > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, >>> > > > > > > >> > > "rack": >>> > > > > > > >> > > > >>> > "abc"} >>> > > > > > > >> > > > >>> > > > >>> > > > > > > >> > > > >>> > > > Aditya >>> > > > > > > >> > > > >>> > > > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen >>> Shapira >>> > < >>> > > > > > > >> > > g...@confluent.io >>> > > > > > > >> > > > > >>> > > > > > > >> > > > >>> > wrote: >>> > > > > > > >> > > > >>> > > > >>> > > > > > > >> > > > >>> > > > > Hi, >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for >>> > this. >>> > > > This >>> > > > > > is >>> > > > > > > >> super >>> > > > > > > >> > > > >>> important >>> > > > > > > >> > > > >>> > > for >>> > > > > > > >> > > > >>> > > > > production deployments of Kafka. >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > > Few questions: >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as >>> > > > > possible"? >>> > > > > > > I'd >>> > > > > > > >> > want >>> > > > > > > >> > > to >>> > > > > > > >> > > > >>> > balance >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and network >>> > > > > utilization >>> > > > > > > >> > (traffic >>> > > > > > > >> > > > >>> within a >>> > > > > > > >> > > > >>> > > > rack >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One >>> > > replica >>> > > > > on >>> > > > > > a >>> > > > > > > >> > > different >>> > > > > > > >> > > > >>> rack >>> > > > > > > >> > > > >>> > > and >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible) >>> sounds >>> > > > better >>> > > > > to >>> > > > > > > me. >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly >>> complex >>> > > > > compared >>> > > > > > to >>> > > > > > > >> > > adding a >>> > > > > > > >> > > > >>> > > > rack.number >>> > > > > > > >> > > > >>> > > > > property to the broker properties file. >>> Why >>> > do >>> > > > we >>> > > > > > want >>> > > > > > > >> > that? >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > > Gwen >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen >>> > Wang < >>> > > > > > > >> > > > >>> allenxw...@gmail.com> >>> > > > > > > >> > > > >>> > > > wrote: >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers, >>> > > > > > > >> > > > >>> > > > > > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware >>> > replica >>> > > > > > > >> assignment. >>> > > > > > > >> > > > >>> > > > > > >>> > > > > > > >> > > > >>> > > > > > >>> > > > > > > >> > > > >>> > > > > > >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > >>> > > > > > > >> > > > >>> > > >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >>> > > > > > > >> > > >>> > > > > > > >> > >>> > > > > > > >> >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment >>> > > > > > > >> > > > >>> > > > > > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation >>> > > provided >>> > > > by >>> > > > > > the >>> > > > > > > >> > racks >>> > > > > > > >> > > in >>> > > > > > > >> > > > >>> data >>> > > > > > > >> > > > >>> > > > center >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks to >>> > provide >>> > > > > fault >>> > > > > > > >> > > tolerance. >>> > > > > > > >> > > > >>> > > > > > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome. >>> > > > > > > >> > > > >>> > > > > > >>> > > > > > > >> > > > >>> > > > > > Thanks, >>> > > > > > > >> > > > >>> > > > > > Allen >>> > > > > > > >> > > > >>> > > > > > >>> > > > > > > >> > > > >>> > > > > >>> > > > > > > >> > > > >>> > > > >>> > > > > > > >> > > > >>> > > >>> > > > > > > >> > > > >>> > >>> > > > > > > >> > > > >>> >>> > > > > > > >> > > > >> >>> > > > > > > >> > > > >> >>> > > > > > > >> > > > > >>> > > > > > > >> > > > >>> > > > > > > >> > > >>> > > > > > > >> > >>> > > > > > > >> >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >>> >>> >>> -- >>> Thanks, >>> Neha >>> >> >> >