Hi, Lucas, 200. That's a valid concern. So, we can probably just keep the current name.
201. I am thinking that you would upgrade in the same way as changing inter.broker.listener.name. This requires 2 rounds of rolling restart. In the first round, we add the controller endpoint to the listeners w/o setting controller.listener.name. In the second round, every broker sets controller.listener.name. At that point, the controller listener is ready in every broker. Thanks, Jun On Tue, Oct 2, 2018 at 10:38 AM, Lucas Wang <lucasatu...@gmail.com> wrote: > Thanks for the further comments, Jun. > > 200. Currently in the code base, we have the term of "ControlBatch" related > to > idempotent/transactional producing. Do you think it's a concern for reusing > the term "control"? > > 201. It's not clear to me how it would work by following the same strategy > for "controller.listener.name". > Say the new controller has its "controller.listener.name" set to the value > "CONTROLLER", and broker 1 > has picked up this KIP by announcing > "endpoints": [ > "CONTROLLER://broker1.example.com:9091", > "INTERNAL://broker1.example.com:9092", > "EXTERNAL://host1.example.com:9093" > ], > > while broker2 has not picked up the change, and is announcing > "endpoints": [ > "INTERNAL://broker2.example.com:9092", > "EXTERNAL://host2.example.com:9093" > ], > to support both broker 1 for the new behavior and broker 2 for the old > behavior, it seems the controller must > check their published endpoints. Am I missing something? > > Thanks! > Lucas > > On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <j...@confluent.io> wrote: > > > Hi, Lucas, > > > > Sorry for the delay. The updated wiki looks good to me overall. Just a > > couple more minor comments. > > > > 200. kafka.network:name=ControllerRequestQueueSize,type=RequestChannel: > The > > name ControllerRequestQueueSize gives the impression that it's only for > the > > controller broker. Perhaps we can just rename all metrics and configs > from > > controller to control. This indicates that the threads and the queues are > > for the control requests (as oppose to data requests). > > > > 201. "<new controller, old broker>: In this scenario, the controller will > > have the "controller.listener.name" config set to a value like > > "CONTROLLER", however the broker's exposed endpoints do not have an entry > > corresponding to the new listener name. Hence the controller should > > preserve the existing behavior by determining the endpoint using > > *inter-broker-listener-name *value. The end result should be the same > > behavior as today." Currently, the controller makes connections based on > > its local inter.broker.listener.name config without checking the target > > broker's ZK registration. For consistency, perhaps we can just follow the > > same strategy for controller.listener.name. This existing behavior seems > > simpler to understand and has the benefit of catching inconsistent > configs > > across brokers. > > > > Thanks, > > > > Jun > > > > On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <lucasatu...@gmail.com> > wrote: > > > > > Hi Jun, > > > > > > Sorry to bother you again. Can you please take a look at the wiki again > > > when you have time? > > > > > > Thanks a lot! > > > Lucas > > > > > > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lucasatu...@gmail.com> > > wrote: > > > > > > > Hi Jun, > > > > > > > > Thanks a lot for the detailed explanation. > > > > I've restored the wiki to a previous version that does not require > > config > > > > changes, > > > > and keeps the current behavior with the proposed changes turned off > by > > > > default. > > > > I'd appreciate it if you can review it again. > > > > > > > > Thanks! > > > > Lucas > > > > > > > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <j...@confluent.io> wrote: > > > > > > > >> Hi, Lucas, > > > >> > > > >> When upgrading to a minor release, I think the expectation is that a > > > user > > > >> wouldn't need to make any config changes, other than the usual > > > >> inter.broker.protocol. If we require other config changes during an > > > >> upgrade, then it's probably better to do that in a major release. > > > >> > > > >> Regarding your proposal, I think removing host/advertised_host in > > favor > > > of > > > >> listeners:advertised_listeners seems useful regardless of this KIP. > > > >> However, that can probably wait until a major release. > > > >> > > > >> As for the controller listener, I am not sure if one has to set it. > To > > > >> make > > > >> a cluster healthy, one sort of have to make sure that the request > > queue > > > is > > > >> never full and no request will be sitting in the request queue for > > long. > > > >> If > > > >> one does that, setting the controller listener may not be necessary. > > On > > > >> the > > > >> flip side, even if one sets the controller listener, but the request > > > queue > > > >> and the request time for the data part are still high, the cluster > may > > > >> still not be healthy. Given that we have already started the 2.1 > > release > > > >> planning, perhaps we can start with not requiring the controller > > > listener. > > > >> If this is indeed something that everyone wants to set, we can make > > it a > > > >> required config in a major release. > > > >> > > > >> Thanks, > > > >> > > > >> Jun > > > >> > > > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <lucasatu...@gmail.com> > > > >> wrote: > > > >> > > > >> > @Jun Rao <j...@confluent.io> > > > >> > > > > >> > I made the recent config changes after thinking about the default > > > >> behavior > > > >> > for adopting this KIP. > > > >> > I think there are basically two options: > > > >> > 1. By default, the behavior proposed in this KIP is turned off, > and > > > >> > operators can turn it > > > >> > on by adding the "controller.listener.name" config and entries in > > the > > > >> > "listeners" and "advertised.listeners" list. > > > >> > If no "controller.listener.name" is added, it'll be the *same as* > > > the " > > > >> > inter.broker.listener.name", > > > >> > and the proposed feature is effectively turned off. > > > >> > This has been the assumption in the KIP writeup before. > > > >> > > > > >> > 2. By default, the behavior proposed in this KIP is turned on, and > > > >> > operators are forced to > > > >> > recognize the proposed change if their "listeners" config is set > > (this > > > >> is > > > >> > most likely in production environments), > > > >> > by allocating a new port for controller connections, and adding a > > new > > > >> > endpoint to the "listeners" config. > > > >> > For cases where "listeners" is not set explicitly, > > > >> > there needs to be a default value for it that includes the > > controller > > > >> > listener name, > > > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091" > > > >> > > > > >> > I chose to go with option 2 since as author of this KIP, > > > >> > I naturally think in the long run, it's worth the effort to adopt > > this > > > >> > feature, > > > >> > in order to prevent issues under circumstances listed in the > > > motivation > > > >> > section. > > > >> > > > > >> > 100, following the argument above, I want to enforce the > separation > > > >> > between controller > > > >> > and data plane requests. Hence the "controller.listener.name" > > should > > > >> > never be the same > > > >> > as the "inter.broker.listener.name", which is intended for data > > plane > > > >> > requests. > > > >> > > > > >> > 101, the default value for "listeners" will be > > > >> > "PLAINTEXT://:9092,CONTROLLER://:9091", > > > >> > making values of "host", and "port" not being used under any > config > > > >> > settings. > > > >> > And the default value for "advertised.listeners" will be derived > > from > > > >> > "listeners", > > > >> > making the values of "advertised.host", and "advertised.port" not > > > being > > > >> > used any more. > > > >> > > > > >> > 102, for upgrading, a single broker that has "listeners" and/or > > > >> > "advertised.listeners" set, > > > >> > must add a new endpoint for the CONTROLLER listener name, or end > up > > > >> using > > > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091". > > > >> > During rolling upgrade, in cases of <old controller> + <new > broker> > > or > > > >> > <new controller> + <old broker> > > > >> > we still need to fall back to the current behavior. However after > > the > > > >> > rolling upgrade is done, > > > >> > it is guaranteed that the controller plane and data plane are > > > separated, > > > >> > given > > > >> > the "controller.listener.name" must be different from " > > > >> > inter.broker.listener.name". > > > >> > > > > >> > @Ismael Juma <ism...@juma.me.uk> > > > >> > Thanks for pointing that out. I did not know that. > > > >> > However my question is if the argument above makes sense, and my > > code > > > >> > change > > > >> > causes the configs "host", "port", "advertised.host", > > > "advertised.port" > > > >> to > > > >> > be not used under any circumstance, > > > >> > then it's no different from removing them. > > > >> > Anyway if there is still a concern about removing them, is there a > > new > > > >> > major new version > > > >> > now or in the future where I can remove them? > > > >> > > > > >> > Thanks! > > > >> > Lucas > > > >> > > > > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <ism...@juma.me.uk> > > > wrote: > > > >> > > > > >> >> To be clear, we can only remove configs in major new versions. > > > >> Otherwise, > > > >> >> we can only deprecate them. > > > >> >> > > > >> >> Ismael > > > >> >> > > > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <j...@confluent.io> > wrote: > > > >> >> > > > >> >> > Hi, Lucas, > > > >> >> > > > > >> >> > For the network idlePct, your understanding is correct. > > Currently, > > > >> >> > networkIdlePct metric is calculated as the average of (1 - > > > io-ratio) > > > >> in > > > >> >> the > > > >> >> > selector of all network threads. > > > >> >> > > > > >> >> > The metrics part looks good to me in the updated KIP. > > > >> >> > > > > >> >> > I am not still not quite sure about the configs. > > > >> >> > > > > >> >> > 100. "Whenever the "controller.listener.name" is set, upon > > broker > > > >> >> startup, > > > >> >> > we will validate its value and make sure it's different from > the > > > >> >> > *inter-broker-listener-name *value." Does that mean that > > > >> >> > controller.listener.name has to be different from > > > >> >> > inter.broker.listener.name? > > > >> >> > That seems limiting. > > > >> >> > > > > >> >> > 101. The KIP says that advertised.listeners and listeners will > > now > > > >> have > > > >> >> a > > > >> >> > different default value including controller. Could you > document > > > what > > > >> >> the > > > >> >> > default value looks like? > > > >> >> > > > > >> >> > 102. About removing the the following configs. How does that > > affect > > > >> the > > > >> >> > upgrade path? Do we now expect a user to add a new config when > > > >> upgrading > > > >> >> > from an old version to a new one? > > > >> >> > host > > > >> >> > port > > > >> >> > advertised.host > > > >> >> > advertised.port > > > >> >> > > > > >> >> > Thanks, > > > >> >> > > > > >> >> > Jun > > > >> >> > > > > >> >> > > > > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang < > > lucasatu...@gmail.com> > > > >> >> wrote: > > > >> >> > > > > >> >> > > @Jun Rao <j...@confluent.io> > > > >> >> > > > > > >> >> > > One clarification, currently on the selector level, we have > the > > > >> >> > > io-wait-ratio metric. > > > >> >> > > For the new controller *network* thread, we can use it > directly > > > for > > > >> >> > > IdlePct, instead of using 1- io-ratio, > > > >> >> > > so that the logic is similar to the current average IdlePct > for > > > >> >> network > > > >> >> > > threads. Is that correct? > > > >> >> > > > > > >> >> > > I've revised the KIP by adding two new metrics for measuring > > the > > > >> >> IdlePct > > > >> >> > > for the two additional threads. > > > >> >> > > Please take a look again. Thanks! > > > >> >> > > > > > >> >> > > Lucas > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <j...@confluent.io> > > wrote: > > > >> >> > > > > > >> >> > > > Hi, Lucas, > > > >> >> > > > > > > >> >> > > > Thanks for the updated KIP. > > > >> >> > > > > > > >> >> > > > For monitoring the network thread utilization for the > control > > > >> >> plane, we > > > >> >> > > > already have the metric io-ratio at the selector level > > (idlePct > > > >> is > > > >> >> 1 - > > > >> >> > > > io-ratio). So, we just need to give that selector a > > meaningful > > > >> name. > > > >> >> > > > > > > >> >> > > > For monitoring the io thread utilization for the control > > plane, > > > >> it's > > > >> >> > > > probably useful to have a separate metric for that. The > > > >> controller > > > >> >> > > request > > > >> >> > > > queue size may not reflect the history in a window. > > > >> >> > > > > > > >> >> > > > Jun > > > >> >> > > > > > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang < > > > >> lucasatu...@gmail.com> > > > >> >> > > wrote: > > > >> >> > > > > > > >> >> > > > > Thanks Jun for your quick response. It looks like I > forgot > > to > > > >> >> click > > > >> >> > the > > > >> >> > > > > "Update" button, :) > > > >> >> > > > > It's updated now. > > > >> >> > > > > > > > >> >> > > > > Regarding the idle ratio metrics for the additional > > threads, > > > I > > > >> >> > > discussed > > > >> >> > > > > with Joel, > > > >> >> > > > > and think they are not as useful, and I added our > reasoning > > > in > > > >> the > > > >> >> > last > > > >> >> > > > > paragraph of the > > > >> >> > > > > "How are controller requests handled over the dedicated > > > >> >> connections?" > > > >> >> > > > > section. > > > >> >> > > > > On the other hand, we don't strongly oppose adding them > if > > > you > > > >> >> think > > > >> >> > > they > > > >> >> > > > > are necessary. > > > >> >> > > > > > > > >> >> > > > > Thanks, > > > >> >> > > > > Lucas > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <j...@confluent.io > > > > > >> wrote: > > > >> >> > > > > > > > >> >> > > > > > Hi, Lucas, > > > >> >> > > > > > > > > >> >> > > > > > Thanks for the reply. Have you actually updated the > KIP? > > > The > > > >> >> wiki > > > >> >> > > says > > > >> >> > > > > that > > > >> >> > > > > > it's last updated on Aug. 22. and some of the changes > > that > > > >> you > > > >> >> > > > mentioned > > > >> >> > > > > > (#1 and #3) are not there. > > > >> >> > > > > > > > > >> >> > > > > > Also, regarding Joel's comment on network/request idle > > > ratio > > > >> >> > metrics, > > > >> >> > > > > could > > > >> >> > > > > > you comment on whether they include the new controller > > > >> >> listener? If > > > >> >> > > > not, > > > >> >> > > > > do > > > >> >> > > > > > we need additional metrics to measure the utilization > of > > > the > > > >> io > > > >> >> > > thread > > > >> >> > > > > for > > > >> >> > > > > > the control plane? > > > >> >> > > > > > > > > >> >> > > > > > Jun > > > >> >> > > > > > > > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang < > > > >> >> lucasatu...@gmail.com > > > >> >> > > > > > >> >> > > > > wrote: > > > >> >> > > > > > > > > >> >> > > > > > > Thanks for the comments, Jun. > > > >> >> > > > > > > > > > >> >> > > > > > > 1. I think the answer should be no, since the " > > > >> >> > > > > > inter.broker.listener.name" > > > >> >> > > > > > > are also used > > > >> >> > > > > > > for replication traffic, and merging these two types > of > > > >> >> request > > > >> >> > to > > > >> >> > > > the > > > >> >> > > > > > > single threaded tunnel > > > >> >> > > > > > > would defeat the purpose of this KIP and also hurt > > > >> replication > > > >> >> > > > > > throughput. > > > >> >> > > > > > > So I think that means > > > >> >> > > > > > > we should validate to make sure when the new config > is > > > set, > > > >> >> it's > > > >> >> > > > > > different > > > >> >> > > > > > > from "inter.broker.listener.name" > > > >> >> > > > > > > or "security.inter.broker.protocol", whichever is > set. > > > >> >> > > > > > > > > > >> >> > > > > > > 2. Normally all broker configs in a given cluster are > > > >> changed > > > >> >> at > > > >> >> > > the > > > >> >> > > > > same > > > >> >> > > > > > > time. If there is a typo in the > > > >> >> > > > > > > controller.listener.name and it's not available in > the > > > >> >> endpoints > > > >> >> > > > list, > > > >> >> > > > > > we > > > >> >> > > > > > > could catch it, give an error > > > >> >> > > > > > > and block restart of the first broker in the cluster. > > > With > > > >> >> that, > > > >> >> > we > > > >> >> > > > > could > > > >> >> > > > > > > keep the current behavior > > > >> >> > > > > > > in the KIP write up that falls back to > > > >> >> > "inter.broker.listener.nam" > > > >> >> > > > when > > > >> >> > > > > > the > > > >> >> > > > > > > "controller.listener.name" > > > >> >> > > > > > > is not found during the migration phase of this KIP. > > > >> Thoughts? > > > >> >> > > > > > > > > > >> >> > > > > > > 3. That makes sense, and I've changed it. > > > >> >> > > > > > > > > > >> >> > > > > > > Thanks, > > > >> >> > > > > > > Lucas > > > >> >> > > > > > > > > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao < > > > j...@confluent.io> > > > >> >> > wrote: > > > >> >> > > > > > > > > > >> >> > > > > > > > Hi, Lucas, > > > >> >> > > > > > > > > > > >> >> > > > > > > > Sorry for the delay. The new proposal looks good to > > me > > > >> >> > overall. A > > > >> >> > > > few > > > >> >> > > > > > > minor > > > >> >> > > > > > > > comments below. > > > >> >> > > > > > > > > > > >> >> > > > > > > > 1. It's possible that listener.name.for.controller > is > > > >> set, > > > >> >> but > > > >> >> > > set > > > >> >> > > > to > > > >> >> > > > > > the > > > >> >> > > > > > > > same value as inter.broker.listener.name. In that > > > case, > > > >> >> should > > > >> >> > > we > > > >> >> > > > > > have a > > > >> >> > > > > > > > single network thread and the request handling > thread > > > for > > > >> >> that > > > >> >> > > > > > listener? > > > >> >> > > > > > > > > > > >> >> > > > > > > > 2. Currently, the controller always picks the > > listener > > > >> >> > specified > > > >> >> > > by > > > >> >> > > > > > > > inter.broker.listener.name even if the listener > name > > > is > > > >> not > > > >> >> > > > present > > > >> >> > > > > in > > > >> >> > > > > > > the > > > >> >> > > > > > > > receiving broker. This KIP proposes a slightly > > > different > > > >> >> > approach > > > >> >> > > > for > > > >> >> > > > > > > > picking listener.name.for.controller only when the > > > >> receiving > > > >> >> > end > > > >> >> > > > has > > > >> >> > > > > > the > > > >> >> > > > > > > > listener and switches listener.name.for.controller > > > >> >> otherwise. > > > >> >> > > There > > > >> >> > > > > are > > > >> >> > > > > > > > some tradeoffs between the two approaches. To > change > > > the > > > >> >> inter > > > >> >> > > > broker > > > >> >> > > > > > > > listener, the former requires 2 steps: (1) adding > the > > > new > > > >> >> > > listener > > > >> >> > > > to > > > >> >> > > > > > > > listener list in every broker and (2) changing > > > >> >> > > > > > > > listener.name.for.controller. > > > >> >> > > > > > > > The latter can do both changes in 1 step. On the > > hand, > > > if > > > >> >> > > > > > > > listener.name.for.controller > > > >> >> > > > > > > > is mis-configured, the former will report an error > > and > > > >> the > > > >> >> > latter > > > >> >> > > > > will > > > >> >> > > > > > > hide > > > >> >> > > > > > > > it (so the user may not know the misconfiguration). > > It > > > >> seems > > > >> >> > that > > > >> >> > > > we > > > >> >> > > > > > > should > > > >> >> > > > > > > > pick one approach to handle both > > > >> >> listener.name.for.controller > > > >> >> > and > > > >> >> > > > > > > > inter.broker.listener.name consistently. To me, > the > > > >> former > > > >> >> > seems > > > >> >> > > > > > > slightly > > > >> >> > > > > > > > better. > > > >> >> > > > > > > > > > > >> >> > > > > > > > 3. To be consistent with the existing naming, > should > > > >> >> > > > > > > > listener.name.for.controller > > > >> >> > > > > > > > be controller.listener.name? > > > >> >> > > > > > > > > > > >> >> > > > > > > > Thanks, > > > >> >> > > > > > > > > > > >> >> > > > > > > > Jun > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang < > > > >> >> > > lucasatu...@gmail.com> > > > >> >> > > > > > > wrote: > > > >> >> > > > > > > > > > > >> >> > > > > > > > > Hi Jun and Joel, > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > The KIP writeup has changed by quite a bit since > > your > > > >> +1. > > > >> >> > > > > > > > > Can you please take another review? Thanks a lot > > for > > > >> your > > > >> >> > time! > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > Lucas > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy < > > > >> >> > > > jjkosh...@gmail.com> > > > >> >> > > > > > > > wrote: > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > +1 on the KIP. > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > (I'm not sure we actually necessary to > introduce > > > the > > > >> >> > > condition > > > >> >> > > > > > > > variables > > > >> >> > > > > > > > > > for the concern that Jun raised, but it's an > > > >> >> implementation > > > >> >> > > > > detail > > > >> >> > > > > > > that > > > >> >> > > > > > > > > we > > > >> >> > > > > > > > > > can defer to a discussion in the PR.) > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang < > > > >> >> > > > > > lucasatu...@gmail.com> > > > >> >> > > > > > > > > > wrote: > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > Hi Jun, > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > I agree by using the conditional variables, > > there > > > >> is > > > >> >> no > > > >> >> > > need > > > >> >> > > > to > > > >> >> > > > > > add > > > >> >> > > > > > > > > such > > > >> >> > > > > > > > > > a > > > >> >> > > > > > > > > > > new config. > > > >> >> > > > > > > > > > > Also thanks for approving this KIP. > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > Lucas > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> > > > > >> > > > > > > > > > >