Jun/Allen - Did we ever actually agree on whether we should evolve the TMR to include rack info or not? I don't feel strongly about it but I if it's the right thing to do we should probably do it in this KIP (can be a separate patch).. it isn't a large change.
Aditya On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <allenxw...@gmail.com> wrote: > Added the rolling upgrade instruction in the KIP, similar to those in 0.9.0 > release notes. > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <allenxw...@gmail.com> wrote: > > > Hi Jun, > > > > The reason that TopicMetadataResponse is not included in the KIP is that > > it currently is not version aware . So we need to introduce version to it > > in order to make sure backward compatibility. It seems to me a big > change. > > Do we want to couple it with this KIP? Do we need to further discuss what > > information to include in the new version besides rack? For example, > should > > we include broker security protocol in TopicMetadataResponse? > > > > The other option is to make it a separate KIP to make > > TopicMetadataResponse version aware and decide what to include, and make > > this KIP focus on the rack aware algorithm, admin tools and related > > changes to inter-broker protocol . > > > > Thanks, > > Allen > > > > > > > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <j...@confluent.io> wrote: > > > >> Allen, > >> > >> Thanks for the proposal. A few comments. > >> > >> 1. Since this KIP changes the inter broker communication protocol > >> (UpdateMetadataRequest), we will need to document the upgrade path > >> (similar > >> to what's described in > >> http://kafka.apache.org/090/documentation.html#upgrade). > >> > >> 2. It might be useful to include the rack info of the broker in > >> TopicMetadataResponse. This can be useful for administrative tasks, as > >> well > >> as read affinity in the future. > >> > >> Jun > >> > >> > >> > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <allenxw...@gmail.com> > wrote: > >> > >> > If there are no more comments I would like to call for a vote. > >> > > >> > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <allenxw...@gmail.com> > >> wrote: > >> > > >> > > KIP is updated with more details and how to handle the situation > where > >> > > rack information is incomplete. > >> > > > >> > > In the situation where rack information is incomplete, but we want > to > >> > > continue with the assignment, I have suggested to ignore all rack > >> > > information and fallback to original algorithm. The reason is > >> explained > >> > > below: > >> > > > >> > > The other options are to assume that the broker without the rack > >> belong > >> > to > >> > > its own unique rack, or they belong to one "default" rack. Either > way > >> we > >> > > choose, it is highly likely to result in uneven number of brokers in > >> > racks, > >> > > and it is quite possible that the "made up" racks will have much > fewer > >> > > number of brokers. As I explained in the KIP, uneven number of > >> brokers in > >> > > racks will lead to uneven distribution of replicas among brokers > (even > >> > > though the leader distribution is still even). The brokers in the > rack > >> > that > >> > > has fewer number of brokers will get more replicas per broker than > >> > brokers > >> > > in other racks. > >> > > > >> > > Given this fact and the replica assignment produced will be > incorrect > >> > > anyway from rack aware point of view, ignoring all rack information > >> and > >> > > fallback to the original algorithm is not a bad choice since it will > >> at > >> > > least have a better guarantee of replica distribution. > >> > > > >> > > Also for command line tools it gives user a choice if for any reason > >> they > >> > > want to ignore rack information and fallback to the original > >> algorithm. > >> > > > >> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <allenxw...@gmail.com> > >> > wrote: > >> > > > >> > >> I am busy with some time pressing issues for the last few days. I > >> will > >> > >> think about how the incomplete rack information will affect the > >> balance > >> > and > >> > >> update the KIP by early next week. > >> > >> > >> > >> Thanks, > >> > >> Allen > >> > >> > >> > >> > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <n...@confluent.io> > >> > wrote: > >> > >> > >> > >>> Few suggestions on improving the KIP > >> > >>> > >> > >>> *If some brokers have rack, and some do not, the algorithm will > >> thrown > >> > an > >> > >>> > exception. This is to prevent incorrect assignment caused by > user > >> > >>> error.* > >> > >>> > >> > >>> > >> > >>> In the KIP, can you clearly state the user-facing behavior when > some > >> > >>> brokers have rack information and some don't. Which actions and > >> > requests > >> > >>> will error out and how? > >> > >>> > >> > >>> *Even distribution of partition leadership among brokers* > >> > >>> > >> > >>> > >> > >>> There is some information about arranging the sorted broker list > >> > >>> interlaced > >> > >>> with rack ids. Can you describe the changes to the current > algorithm > >> > in a > >> > >>> little more detail? How does this interlacing work if only a > subset > >> of > >> > >>> brokers have the rack id configured? Does this still work if > uneven > >> # > >> > of > >> > >>> brokers are assigned to each rack? It might work, I'm looking for > >> more > >> > >>> details on the changes, since it will affect the behavior seen by > >> the > >> > >>> user > >> > >>> - imbalance on either the leaders or data or both. > >> > >>> > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar < > >> > aaurad...@linkedin.com> > >> > >>> wrote: > >> > >>> > >> > >>> > I think this sounds reasonable. Anyone else have comments? > >> > >>> > > >> > >>> > Aditya > >> > >>> > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang < > allenxw...@gmail.com > >> > > >> > >>> wrote: > >> > >>> > > >> > >>> > > During the discussion in the hangout, it was mentioned that it > >> > would > >> > >>> be > >> > >>> > > desirable that consumers know the rack information of the > >> brokers > >> > so > >> > >>> that > >> > >>> > > they can consume from the broker in the same rack to reduce > >> > latency. > >> > >>> As I > >> > >>> > > understand this will only be beneficial if consumer can > consume > >> > from > >> > >>> any > >> > >>> > > broker in ISR, which is not possible now. > >> > >>> > > > >> > >>> > > I suggest we skip the change to TMR. Once the change is made > to > >> > >>> consumer > >> > >>> > to > >> > >>> > > be able to consume from any broker in ISR, the rack > information > >> can > >> > >>> be > >> > >>> > > added to TMR. > >> > >>> > > > >> > >>> > > Another thing I want to confirm is command line behavior. I > >> think > >> > >>> the > >> > >>> > > desirable default behavior is to fail fast on command line for > >> > >>> incomplete > >> > >>> > > rack mapping. The error message can include further > instruction > >> > that > >> > >>> > tells > >> > >>> > > the user to add an extra argument (like > >> "--allow-partial-rackinfo") > >> > >>> to > >> > >>> > > suppress the error and do an imperfect rack aware assignment. > If > >> > the > >> > >>> > > default behavior is to allow incomplete mapping, the error can > >> > still > >> > >>> be > >> > >>> > > easily missed. > >> > >>> > > > >> > >>> > > The affected command line tools are TopicCommand and > >> > >>> > > ReassignPartitionsCommand. > >> > >>> > > > >> > >>> > > Thanks, > >> > >>> > > Allen > >> > >>> > > > >> > >>> > > > >> > >>> > > > >> > >>> > > > >> > >>> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar < > >> > >>> > aaurad...@linkedin.com> > >> > >>> > > wrote: > >> > >>> > > > >> > >>> > > > Hi Allen, > >> > >>> > > > > >> > >>> > > > For TopicMetadataResponse to understand version, you can > bump > >> up > >> > >>> the > >> > >>> > > > request version itself. Based on the version of the request, > >> the > >> > >>> > response > >> > >>> > > > can be appropriately serialized. It shouldn't be a huge > >> change. > >> > For > >> > >>> > > > example: We went through something similar for > ProduceRequest > >> > >>> recently > >> > >>> > ( > >> > >>> > > > https://reviews.apache.org/r/33378/) > >> > >>> > > > I guess the reason protocol information is not included in > the > >> > TMR > >> > >>> is > >> > >>> > > > because the topic itself is independent of any particular > >> > protocol > >> > >>> (SSL > >> > >>> > > vs > >> > >>> > > > Plaintext). Having said that, I'm not sure we even need rack > >> > >>> > information > >> > >>> > > in > >> > >>> > > > TMR. What usecase were you thinking of initially? > >> > >>> > > > > >> > >>> > > > For 1 - I'd be fine with adding an option to the command > line > >> > tools > >> > >>> > that > >> > >>> > > > check rack assignment. For e.g. "--strict-assignment" or > >> > something > >> > >>> > > similar. > >> > >>> > > > > >> > >>> > > > Aditya > >> > >>> > > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang < > >> > allenxw...@gmail.com> > >> > >>> > > wrote: > >> > >>> > > > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a look. > One > >> > >>> thing I > >> > >>> > > have > >> > >>> > > > > changed is removing the proposal to add rack to > >> > >>> > TopicMetadataResponse. > >> > >>> > > > The > >> > >>> > > > > reason is that unlike UpdateMetadataRequest, > >> > >>> TopicMetadataResponse > >> > >>> > does > >> > >>> > > > not > >> > >>> > > > > understand version. I don't see a way to include rack > >> without > >> > >>> > breaking > >> > >>> > > > old > >> > >>> > > > > version of clients. That's probably why secure protocol is > >> not > >> > >>> > included > >> > >>> > > > in > >> > >>> > > > > the TopicMetadataResponse either. I think it will be a > much > >> > >>> bigger > >> > >>> > > change > >> > >>> > > > > to include rack in TopicMetadataResponse. > >> > >>> > > > > > >> > >>> > > > > For 1, my concern is that doing rack aware assignment > >> without > >> > >>> > complete > >> > >>> > > > > broker to rack mapping will result in assignment that is > not > >> > rack > >> > >>> > aware > >> > >>> > > > and > >> > >>> > > > > fail to provide fault tolerance in the event of rack > outage. > >> > This > >> > >>> > kind > >> > >>> > > of > >> > >>> > > > > problem will be difficult to surface. And the cost of this > >> > >>> problem is > >> > >>> > > > high: > >> > >>> > > > > you have to do partition reassignment if you are lucky to > >> spot > >> > >>> the > >> > >>> > > > problem > >> > >>> > > > > early on or face the consequence of data loss during real > >> rack > >> > >>> > outage. > >> > >>> > > > > > >> > >>> > > > > I do see the concern of fail-fast as it might also cause > >> data > >> > >>> loss if > >> > >>> > > > > producer is not able produce the message due to topic > >> creation > >> > >>> > failure. > >> > >>> > > > Is > >> > >>> > > > > it feasible to treat dynamic topic creation and command > >> tools > >> > >>> > > > differently? > >> > >>> > > > > We allow dynamic topic creation with incomplete > broker-rack > >> > >>> mapping > >> > >>> > and > >> > >>> > > > > fail fast in command line. Another option is to let user > >> > >>> determine > >> > >>> > the > >> > >>> > > > > behavior for command line. For example, by default fail > >> fast in > >> > >>> > command > >> > >>> > > > > line but allow incomplete broker-rack mapping if another > >> switch > >> > >>> is > >> > >>> > > > > provided. > >> > >>> > > > > > >> > >>> > > > > > >> > >>> > > > > > >> > >>> > > > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar < > >> > >>> > > > > aaurad...@linkedin.com.invalid> wrote: > >> > >>> > > > > > >> > >>> > > > > > Hey Allen, > >> > >>> > > > > > > >> > >>> > > > > > 1. If we choose fail fast topic creation, we will have > >> topic > >> > >>> > creation > >> > >>> > > > > > failures while upgrading the cluster. I really doubt we > >> want > >> > >>> this > >> > >>> > > > > behavior. > >> > >>> > > > > > Ideally, this should be invisible to clients of a > cluster. > >> > >>> > Currently, > >> > >>> > > > > each > >> > >>> > > > > > broker is effectively its own rack. So we probably can > use > >> > the > >> > >>> rack > >> > >>> > > > > > information whenever possible but not make it a hard > >> > >>> requirement. > >> > >>> > To > >> > >>> > > > > extend > >> > >>> > > > > > Gwen's example, one badly configured broker should not > >> > degrade > >> > >>> > topic > >> > >>> > > > > > creation for the entire cluster. > >> > >>> > > > > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the > upgrade > >> > >>> piece to > >> > >>> > > > > confirm > >> > >>> > > > > > that old clients will not see errors? I believe > >> > >>> > > > > ZookeeperConsumerConnector > >> > >>> > > > > > reads the Broker objects from ZK. I wanted to confirm > that > >> > this > >> > >>> > will > >> > >>> > > > not > >> > >>> > > > > > cause any problems. > >> > >>> > > > > > > >> > >>> > > > > > 3. Could you elaborate your proposed changes to the > >> > >>> > > > UpdateMetadataRequest > >> > >>> > > > > > in the "Public Interfaces" section? Personally, I find > >> this > >> > >>> format > >> > >>> > > easy > >> > >>> > > > > to > >> > >>> > > > > > read in terms of wire protocol changes: > >> > >>> > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >>> > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest > >> > >>> > > > > > > >> > >>> > > > > > Aditya > >> > >>> > > > > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang < > >> > >>> allenxw...@gmail.com> > >> > >>> > > > > wrote: > >> > >>> > > > > > > >> > >>> > > > > > > KIP is updated include rack as an optional property > for > >> > >>> broker. > >> > >>> > > > Please > >> > >>> > > > > > take > >> > >>> > > > > > > a look and let me know if more details are needed. > >> > >>> > > > > > > > >> > >>> > > > > > > For the case where some brokers have rack and some do > >> not, > >> > >>> the > >> > >>> > > > current > >> > >>> > > > > > KIP > >> > >>> > > > > > > uses the fail-fast behavior. If there are concerns, we > >> can > >> > >>> > further > >> > >>> > > > > > discuss > >> > >>> > > > > > > this in the email thread or next hangout. > >> > >>> > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang < > >> > >>> > allenxw...@gmail.com > >> > >>> > > > > >> > >>> > > > > > wrote: > >> > >>> > > > > > > > >> > >>> > > > > > > > That's a good question. I can think of three actions > >> if > >> > the > >> > >>> > rack > >> > >>> > > > > > > > information is incomplete: > >> > >>> > > > > > > > > >> > >>> > > > > > > > 1. Treat the node without rack as if it is on its > >> unique > >> > >>> rack > >> > >>> > > > > > > > 2. Disregard all rack information and fallback to > >> current > >> > >>> > > algorithm > >> > >>> > > > > > > > 3. Fail-fast > >> > >>> > > > > > > > > >> > >>> > > > > > > > Now I think about it, one and three make more sense. > >> The > >> > >>> reason > >> > >>> > > for > >> > >>> > > > > > > > fail-fast is that user mistake for not providing the > >> rack > >> > >>> may > >> > >>> > > never > >> > >>> > > > > be > >> > >>> > > > > > > > found if we tolerate that and the assignment may not > >> be > >> > >>> rack > >> > >>> > > aware > >> > >>> > > > as > >> > >>> > > > > > the > >> > >>> > > > > > > > user has expected and this creates debug problems > when > >> > >>> things > >> > >>> > > fail. > >> > >>> > > > > > > > > >> > >>> > > > > > > > What do you think? If not fail-fast, is there anyway > >> we > >> > can > >> > >>> > make > >> > >>> > > > the > >> > >>> > > > > > user > >> > >>> > > > > > > > error standing out? > >> > >>> > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira < > >> > >>> > > g...@confluent.io> > >> > >>> > > > > > > wrote: > >> > >>> > > > > > > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers have > rack > >> > >>> > assignment > >> > >>> > > > and > >> > >>> > > > > > some > >> > >>> > > > > > > >> don't, do we act like none of them have it? or like > >> > those > >> > >>> > > without > >> > >>> > > > > > > >> assignment are in their own rack? > >> > >>> > > > > > > >> > >> > >>> > > > > > > >> The first scenario is good when first setting up > >> > >>> > rack-awareness, > >> > >>> > > > but > >> > >>> > > > > > the > >> > >>> > > > > > > >> second makes more sense for on-going maintenance (I > >> can > >> > >>> > totally > >> > >>> > > > see > >> > >>> > > > > > > >> someone > >> > >>> > > > > > > >> adding a node and forgetting to set the rack > >> property, > >> > we > >> > >>> > don't > >> > >>> > > > want > >> > >>> > > > > > > this > >> > >>> > > > > > > >> to change behavior for anything except the new > node). > >> > >>> > > > > > > >> > >> > >>> > > > > > > >> What do you think? > >> > >>> > > > > > > >> > >> > >>> > > > > > > >> Gwen > >> > >>> > > > > > > >> > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang < > >> > >>> > > > allenxw...@gmail.com> > >> > >>> > > > > > > >> wrote: > >> > >>> > > > > > > >> > >> > >>> > > > > > > >> > For scenario 1: > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > - Add the rack information to broker property > file > >> or > >> > >>> > > > dynamically > >> > >>> > > > > > set > >> > >>> > > > > > > >> it in > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You > >> would > >> > do > >> > >>> > that > >> > >>> > > > for > >> > >>> > > > > > all > >> > >>> > > > > > > >> > brokers and restart the brokers one by one. > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > In this scenario, the complete broker to rack > >> mapping > >> > >>> may > >> > >>> > not > >> > >>> > > be > >> > >>> > > > > > > >> available > >> > >>> > > > > > > >> > until every broker is restarted. During that time > >> we > >> > >>> fall > >> > >>> > back > >> > >>> > > > to > >> > >>> > > > > > > >> default > >> > >>> > > > > > > >> > replica assignment algorithm. > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > For scenario 2: > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > - Add the rack information to broker property > file > >> or > >> > >>> > > > dynamically > >> > >>> > > > > > set > >> > >>> > > > > > > >> it in > >> > >>> > > > > > > >> > the wrapper code and start the broker. > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira < > >> > >>> > > > g...@confluent.io> > >> > >>> > > > > > > >> wrote: > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > > Can you clarify the workflow for the following > >> > >>> scenarios: > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add > >> rack > >> > >>> > > information > >> > >>> > > > > for > >> > >>> > > > > > > >> each > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to > specify > >> > which > >> > >>> > rack > >> > >>> > > it > >> > >>> > > > > > > >> belongs on > >> > >>> > > > > > > >> > > while adding it. > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > >> > > Thanks! > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang < > >> > >>> > > > > allenxw...@gmail.com > >> > >>> > > > > > > > >> > >>> > > > > > > >> > wrote: > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout today. > The > >> > >>> > > > recommendation > >> > >>> > > > > is > >> > >>> > > > > > > to > >> > >>> > > > > > > >> > make > >> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For > >> users > >> > >>> with > >> > >>> > > > > existing > >> > >>> > > > > > > rack > >> > >>> > > > > > > >> > > > information stored somewhere, they would need > >> to > >> > >>> > retrieve > >> > >>> > > > the > >> > >>> > > > > > > >> > information > >> > >>> > > > > > > >> > > > at broker start up and dynamically set the > rack > >> > >>> > property, > >> > >>> > > > > which > >> > >>> > > > > > > can > >> > >>> > > > > > > >> be > >> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker. > >> > There > >> > >>> will > >> > >>> > > be > >> > >>> > > > no > >> > >>> > > > > > > >> > interface > >> > >>> > > > > > > >> > > or > >> > >>> > > > > > > >> > > > pluggable implementation to retrieve the rack > >> > >>> > information. > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > The assumption is that you always need to > >> restart > >> > >>> the > >> > >>> > > broker > >> > >>> > > > > to > >> > >>> > > > > > > >> make a > >> > >>> > > > > > > >> > > > change to the rack. > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker property, it > >> will > >> > be > >> > >>> > > possible > >> > >>> > > > > to > >> > >>> > > > > > > make > >> > >>> > > > > > > >> > rack > >> > >>> > > > > > > >> > > > part of the meta data to help the consumer > >> choose > >> > >>> which > >> > >>> > in > >> > >>> > > > > sync > >> > >>> > > > > > > >> replica > >> > >>> > > > > > > >> > > to > >> > >>> > > > > > > >> > > > consume from as part of the future consumer > >> > >>> enhancement. > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > I will update the KIP. > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > Thanks, > >> > >>> > > > > > > >> > > > Allen > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang < > >> > >>> > > > > > allenxw...@gmail.com> > >> > >>> > > > > > > >> > wrote: > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this > KIP > >> > was > >> > >>> not > >> > >>> > > > > > discussed > >> > >>> > > > > > > >> due > >> > >>> > > > > > > >> > to > >> > >>> > > > > > > >> > > > > time constraint. > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > > However, after hearing discussion of > KIP-35, > >> I > >> > >>> have > >> > >>> > the > >> > >>> > > > > > feeling > >> > >>> > > > > > > >> that > >> > >>> > > > > > > >> > > > > incompatibility (caused by new broker > >> property) > >> > >>> > between > >> > >>> > > > > > brokers > >> > >>> > > > > > > >> with > >> > >>> > > > > > > >> > > > > different versions will be solved there. > In > >> > >>> addition, > >> > >>> > > > > having > >> > >>> > > > > > > >> stack > >> > >>> > > > > > > >> > in > >> > >>> > > > > > > >> > > > > broker property as meta data may also help > >> > >>> consumers > >> > >>> > in > >> > >>> > > > the > >> > >>> > > > > > > >> future. > >> > >>> > > > > > > >> > So > >> > >>> > > > > > > >> > > I > >> > >>> > > > > > > >> > > > am > >> > >>> > > > > > > >> > > > > open to adding stack property to broker. > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the next > KIP > >> > >>> hangout. > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen > Wang < > >> > >>> > > > > > > allenxw...@gmail.com > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > > > wrote: > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > >> Can you send me the information on the > next > >> KIP > >> > >>> > > hangout? > >> > >>> > > > > > > >> > > > >> > >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not > >> > cached. > >> > >>> In > >> > >>> > > > > > KafkaApis, > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each > >> time > >> > the > >> > >>> > > mapping > >> > >>> > > > > is > >> > >>> > > > > > > >> needed > >> > >>> > > > > > > >> > > for > >> > >>> > > > > > > >> > > > >> auto topic creation. This will ensure > latest > >> > >>> mapping > >> > >>> > is > >> > >>> > > > > used > >> > >>> > > > > > at > >> > >>> > > > > > > >> any > >> > >>> > > > > > > >> > > > time. > >> > >>> > > > > > > >> > > > >> > >> > >>> > > > > > > >> > > > >> The ability to get the complete mapping > >> makes > >> > it > >> > >>> > simple > >> > >>> > > > to > >> > >>> > > > > > > reuse > >> > >>> > > > > > > >> the > >> > >>> > > > > > > >> > > > same > >> > >>> > > > > > > >> > > > >> interface in command line tools. > >> > >>> > > > > > > >> > > > >> > >> > >>> > > > > > > >> > > > >> > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya > >> > >>> Auradkar < > >> > >>> > > > > > > >> > > > >> aaurad...@linkedin.com.invalid> wrote: > >> > >>> > > > > > > >> > > > >> > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next > KIP > >> > >>> hangout? > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator > can > >> be > >> > >>> useful > >> > >>> > > > but I > >> > >>> > > > > > do > >> > >>> > > > > > > >> see a > >> > >>> > > > > > > >> > > few > >> > >>> > > > > > > >> > > > >>> concerns: > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in the > >> > >>> document), > >> > >>> > > > implies > >> > >>> > > > > > that > >> > >>> > > > > > > >> it > >> > >>> > > > > > > >> > can > >> > >>> > > > > > > >> > > > >>> discover rack information for any node in > >> the > >> > >>> > cluster. > >> > >>> > > > How > >> > >>> > > > > > > does > >> > >>> > > > > > > >> it > >> > >>> > > > > > > >> > > deal > >> > >>> > > > > > > >> > > > >>> with rack location changes? For example, > >> if I > >> > >>> moved > >> > >>> > > > broker > >> > >>> > > > > > id > >> > >>> > > > > > > >> (1) > >> > >>> > > > > > > >> > > from > >> > >>> > > > > > > >> > > > >>> rack > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that broker > >> with > >> > a > >> > >>> > newer > >> > >>> > > > rack > >> > >>> > > > > > > >> config. > >> > >>> > > > > > > >> > If > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack > >> > >>> information at > >> > >>> > > > start > >> > >>> > > > > up > >> > >>> > > > > > > >> time, > >> > >>> > > > > > > >> > > any > >> > >>> > > > > > > >> > > > >>> change to a broker will require bouncing > >> the > >> > >>> entire > >> > >>> > > > > cluster > >> > >>> > > > > > > >> since > >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to any > >> node > >> > in > >> > >>> the > >> > >>> > > > > cluster. > >> > >>> > > > > > > >> > > > >>> For this reason it may be simpler to have > >> each > >> > >>> node > >> > >>> > be > >> > >>> > > > > aware > >> > >>> > > > > > > of > >> > >>> > > > > > > >> its > >> > >>> > > > > > > >> > > own > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during start up > >> > time. > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an > >> > external > >> > >>> > > service > >> > >>> > > > > > being > >> > >>> > > > > > > >> > > available > >> > >>> > > > > > > >> > > > >>> to > >> > >>> > > > > > > >> > > > >>> serve rack information. > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a > couple > >> of > >> > >>> other > >> > >>> > > > > systems > >> > >>> > > > > > > deal > >> > >>> > > > > > > >> > with > >> > >>> > > > > > > >> > > > >>> zone/rack awareness. > >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting modes are: > >> > >>> > > > > > > >> > > > >>> (Property File configuration) > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >>> > >> > > >> > http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html > >> > >>> > > > > > > >> > > > >>> (Dynamic inference) > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >>> > >> > > >> > http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone > >> > assignment > >> > >>> > based > >> > >>> > > on > >> > >>> > > > > > > >> > > configuration. > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> Aditya > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen > >> Wang < > >> > >>> > > > > > > >> allenxw...@gmail.com > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote: > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >>> > I would like to see if we can do both: > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to > >> facilitate > >> > >>> > migration > >> > >>> > > > > with > >> > >>> > > > > > > >> > existing > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional property for > >> broker. > >> > >>> If > >> > >>> > rack > >> > >>> > > > is > >> > >>> > > > > > > >> available > >> > >>> > > > > > > >> > > > from > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of truth. > For > >> > users > >> > >>> > with > >> > >>> > > > > > existing > >> > >>> > > > > > > >> > > > >>> broker-rack > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use > the > >> > >>> pluggable > >> > >>> > > way > >> > >>> > > > > or > >> > >>> > > > > > > they > >> > >>> > > > > > > >> > can > >> > >>> > > > > > > >> > > > >>> transfer > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack > property. > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what happens > >> at > >> > >>> rolling > >> > >>> > > > > upgrade > >> > >>> > > > > > > >> when > >> > >>> > > > > > > >> > we > >> > >>> > > > > > > >> > > > have > >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For brokers > >> with > >> > >>> older > >> > >>> > > > > version > >> > >>> > > > > > of > >> > >>> > > > > > > >> > Kafka, > >> > >>> > > > > > > >> > > > >>> will it > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is there > >> any > >> > >>> > > > workaround? > >> > >>> > > > > I > >> > >>> > > > > > > also > >> > >>> > > > > > > >> > > think > >> > >>> > > > > > > >> > > > it > >> > >>> > > > > > > >> > > > >>> > would be better not to have rack in the > >> > >>> controller > >> > >>> > > > wire > >> > >>> > > > > > > >> protocol > >> > >>> > > > > > > >> > > but > >> > >>> > > > > > > >> > > > >>> not > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable. > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > Thanks, > >> > >>> > > > > > > >> > > > >>> > Allen > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd > >> > Palino < > >> > >>> > > > > > > >> tpal...@gmail.com> > >> > >>> > > > > > > >> > > > >>> wrote: > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a > pluggable > >> > >>> locator. > >> > >>> > > For > >> > >>> > > > > > > >> example, we > >> > >>> > > > > > > >> > > > >>> already > >> > >>> > > > > > > >> > > > >>> > > have an interface for discovering > >> > >>> information > >> > >>> > > about > >> > >>> > > > > the > >> > >>> > > > > > > >> > physical > >> > >>> > > > > > > >> > > > >>> location > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea > of > >> > >>> having to > >> > >>> > > > > > maintain > >> > >>> > > > > > > >> data > >> > >>> > > > > > > >> > in > >> > >>> > > > > > > >> > > > >>> > multiple > >> > >>> > > > > > > >> > > > >>> > > places. > >> > >>> > > > > > > >> > > > >>> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd > >> > >>> > > > > > > >> > > > >>> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, > Aditya > >> > >>> > Auradkar < > >> > >>> > > > > > > >> > > > >>> > > aaurad...@linkedin.com.invalid> > wrote: > >> > >>> > > > > > > >> > > > >>> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen. > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a > >> > >>> RackLocator > >> > >>> > > class > >> > >>> > > > > that > >> > >>> > > > > > > is > >> > >>> > > > > > > >> > > > pluggable > >> > >>> > > > > > > >> > > > >>> > seems > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers > to > >> > >>> > potentially > >> > >>> > > > > > non-ZK > >> > >>> > > > > > > >> > storage > >> > >>> > > > > > > >> > > > >>> for the > >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think is > >> > >>> necessary. > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in > >> zk > >> > >>> under > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id> > >> > >>> > > > > > > >> > > > >>> > > > similar to other broker properties > >> and > >> > >>> add a > >> > >>> > > > config > >> > >>> > > > > in > >> > >>> > > > > > > >> > > > KafkaConfig > >> > >>> > > > > > > >> > > > >>> > called > >> > >>> > > > > > > >> > > > >>> > > > "rack". > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, > >> > >>> > > > > > > >> > > "rack": > >> > >>> > > > > > > >> > > > >>> > "abc"} > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, > Gwen > >> > >>> Shapira > >> > >>> > < > >> > >>> > > > > > > >> > > g...@confluent.io > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > >>> > wrote: > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi, > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a > KIP > >> > for > >> > >>> > this. > >> > >>> > > > This > >> > >>> > > > > > is > >> > >>> > > > > > > >> super > >> > >>> > > > > > > >> > > > >>> important > >> > >>> > > > > > > >> > > > >>> > > for > >> > >>> > > > > > > >> > > > >>> > > > > production deployments of Kafka. > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions: > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many > >> racks > >> > as > >> > >>> > > > > possible"? > >> > >>> > > > > > > I'd > >> > >>> > > > > > > >> > want > >> > >>> > > > > > > >> > > to > >> > >>> > > > > > > >> > > > >>> > balance > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and > >> > network > >> > >>> > > > > utilization > >> > >>> > > > > > > >> > (traffic > >> > >>> > > > > > > >> > > > >>> within a > >> > >>> > > > > > > >> > > > >>> > > > rack > >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR > >> switch). > >> > One > >> > >>> > > replica > >> > >>> > > > > on > >> > >>> > > > > > a > >> > >>> > > > > > > >> > > different > >> > >>> > > > > > > >> > > > >>> rack > >> > >>> > > > > > > >> > > > >>> > > and > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if > possible) > >> > >>> sounds > >> > >>> > > > better > >> > >>> > > > > to > >> > >>> > > > > > > me. > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems > overly > >> > >>> complex > >> > >>> > > > > compared > >> > >>> > > > > > to > >> > >>> > > > > > > >> > > adding a > >> > >>> > > > > > > >> > > > >>> > > > rack.number > >> > >>> > > > > > > >> > > > >>> > > > > property to the broker properties > >> > file. > >> > >>> Why > >> > >>> > do > >> > >>> > > > we > >> > >>> > > > > > want > >> > >>> > > > > > > >> > that? > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, > >> > Allen > >> > >>> > Wang < > >> > >>> > > > > > > >> > > > >>> allenxw...@gmail.com> > >> > >>> > > > > > > >> > > > >>> > > > wrote: > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers, > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack > >> aware > >> > >>> > replica > >> > >>> > > > > > > >> assignment. > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > > > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >>> > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the > >> isolation > >> > >>> > > provided > >> > >>> > > > by > >> > >>> > > > > > the > >> > >>> > > > > > > >> > racks > >> > >>> > > > > > > >> > > in > >> > >>> > > > > > > >> > > > >>> data > >> > >>> > > > > > > >> > > > >>> > > > center > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to > racks > >> to > >> > >>> > provide > >> > >>> > > > > fault > >> > >>> > > > > > > >> > > tolerance. > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome. > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks, > >> > >>> > > > > > > >> > > > >>> > > > > > Allen > >> > >>> > > > > > > >> > > > >>> > > > > > > >> > >>> > > > > > > >> > > > >>> > > > > > >> > >>> > > > > > > >> > > > >>> > > > > >> > >>> > > > > > > >> > > > >>> > > > >> > >>> > > > > > > >> > > > >>> > > >> > >>> > > > > > > >> > > > >>> > >> > >>> > > > > > > >> > > > >> > >> > >>> > > > > > > >> > > > >> > >> > >>> > > > > > > >> > > > > > >> > >>> > > > > > > >> > > > > >> > >>> > > > > > > >> > > > >> > >>> > > > > > > >> > > >> > >>> > > > > > > >> > >> > >>> > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >>> > >> > >>> > >> > >>> > >> > >>> -- > >> > >>> Thanks, > >> > >>> Neha > >> > >>> > >> > >> > >> > >> > >> > > > >> > > >> > > > > >