Thanks Jun. Discussed this offline with Onur and Jun and I believe there's agreement so updated the KIP:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action? pageId=69407758&selectedPageVersions=8&selectedPageVersions=7 Ismael On Wed, May 10, 2017 at 4:46 PM, Jun Rao <j...@confluent.io> wrote: > Hi, Onur, > > We probably don't want to do the 1-to-1 mapping from the event type to the > controller state since some of the event types are implementation details. > How about the following mapping? > > 0 - idle > 1 - controller change (Startup, ControllerChange, Reelect) > 2 - broker change (BrokerChange) > 3 - topic creation/change (TopicChange, PartitionModifications) > 4 - topic deletion (TopicDeletion, TopicDeletionStopReplicaResult) > 5 - partition reassigning (PartitionReassignment, > PartitionReassignmentIsrChange) > 6 - auto leader balancing (AutoPreferredReplicaLeaderElection) > 7 - manual leader balancing (PreferredReplicaLeaderElection) > 8 - controlled shutdown (ControlledShutdown) > 9 - isr change (IsrChangeNotification) > > For each state, we will add a corresponding timer to track the rate and the > latency, if it's not there already (e.g., broker change and controlled > shutdown). If there are future changes to the controller, we can make a > call whether the new event should be mapped to one of the existing states > or a new state. > > Thanks, > > Jun > > On Tue, May 9, 2017 at 6:17 PM, Onur Karaman <onurkaraman.apa...@gmail.com > > > wrote: > > > @Ismael, Jun > > After bringing up an earlier point twice now, it still doesn't feel like > > it's been commented on/addressed, so I'm going to give it one more shot: > > Assuming that ControllerState should reflect the current event being > > processed, the KIP is missing states. > > > > The controller currently has 14 event types: > > BrokerChange > > TopicChange > > PartitionModifications > > TopicDeletion > > PartitionReassignment > > PartitionReassignmentIsrChange > > IsrChangeNotification > > PreferredReplicaLeaderElection > > AutoPreferredReplicaLeaderElection > > ControlledShutdown > > TopicDeletionStopReplicaResult > > Startup > > ControllerChange > > Reelect > > > > The KIP only shows 10 event types (and it's not a proper subset of the > > above set). > > > > I think this mismatch would cause the ControllerState to incorrectly be > in > > the Idle state when in fact the controller could be doing a lot of work. > > > > 1. Should ControllerState exactly consist of the 14 controller event > types > > + the 1 Idle state? > > 2. If so, what's the process for adding/removing/merging event types > > w.r.t. this metric? > > > > On Tue, May 9, 2017 at 4:45 PM, Ismael Juma <ism...@juma.me.uk> wrote: > > > >> Becket, > >> > >> Are you OK with extending the metrics via a subsequent KIP (assuming > that > >> what we're doing here doesn't prevent that)? The KIP freeze is tomorrow > >> (although I will give it an extra day or two as many in the community > have > >> been attending the Kafka Summit this week), so we should avoid > increasing > >> the scope unless it's important for future improvements. > >> > >> Thanks, > >> Ismael > >> > >> On Wed, May 10, 2017 at 12:09 AM, Jun Rao <j...@confluent.io> wrote: > >> > >> > Hi, Becket, > >> > > >> > q10. The reason why there is not a timer metric for broker change > event > >> is > >> > that the controller currently always has a LeaderElectionRateAndTimeMs > >> > timer metric (in ControllerStats). > >> > > >> > q11. I agree that that it's useful to know the queue time in the > >> > controller event queue and suggested that earlier. Onur thinks that > >> it's a > >> > bit too early to add that since we are about to change how to queue > >> events > >> > from ZK. Similarly, we will probably also make changes to batch > requests > >> > from the controller to the broker. So, perhaps we can add more metrics > >> once > >> > those changes in the controller have been made. For now, knowing the > >> > controller state and the controller channel queue size is probably > good > >> > enough. > >> > > >> > Thanks, > >> > > >> > Jun > >> > > >> > > >> > > >> > On Mon, May 8, 2017 at 10:05 PM, Becket Qin <becket....@gmail.com> > >> wrote: > >> > > >> >> @Ismael, > >> >> > >> >> About the stage and event type. Yes, I think each event handling > should > >> >> have those stages covered. It is similar to what we are doing for the > >> >> requests on the broker side. We have benefited from such systematic > >> metric > >> >> structure a lot so I think it would be worth following the same way > in > >> the > >> >> controller. > >> >> > >> >> As an example, for BrokerChangeEvent (or any event), I am thinking we > >> >> would > >> >> have the following metrics: > >> >> 1. EventRate > >> >> 2. EventQueueTime : The event queue time > >> >> 3. EventHandlingTime: The event handling time (including zk path > >> updates) > >> >> 4. EventControllerChannelQueueTime: The queue time of the > >> corresponding > >> >> LeaderAndIsrRequest and UpdateMetadataRequest in the controller > channel > >> >> queue. > >> >> 5. EventControllerChannelSendTime: The time to send the > corresponding > >> >> requests, could be the total time for the corresponding > >> >> LeaderAndIsrRequest > >> >> and UpdateMetadataRequest. It is typically small, but in some cases > >> could > >> >> be slower than we expect. > >> >> 6. EventAckTime: The time waiting for all the corresponding > responses. > >> >> 7. EventHandlingTotalTime: sum of 2-6 > >> >> > >> >> Note that all the metrics are from the event and cluster wide state > >> >> transition point of view, but not from a single request type point of > >> >> view. > >> >> Among the above metrics, 4,5,6 could potentially be per broker. > >> >> > >> >> Thanks, > >> >> > >> >> Jiangjie (Becket) Qin > >> >> > >> >> On Mon, May 8, 2017 at 7:24 PM, Onur Karaman < > >> >> onurkaraman.apa...@gmail.com> > >> >> wrote: > >> >> > >> >> > I had a similar comment to Becket but accidentally posted it on the > >> vote > >> >> > thread last Friday. From that thread: > >> >> > "I noticed that both the ControllerState metric and the > >> *RateAndTimeMs > >> >> > metrics > >> >> > only cover a subset of the controller event types. Was this > >> >> intentional?" > >> >> > > >> >> > I think it makes most sense to just have the ControllerState metric > >> and > >> >> > *RateAndTimeMs metrics exactly match up with the event types. > >> >> > > >> >> > On Mon, May 8, 2017 at 4:35 PM, Ismael Juma <ism...@juma.me.uk> > >> wrote: > >> >> > > >> >> > > Hi Becket, > >> >> > > > >> >> > > Thanks for the feedback. Comments inline. > >> >> > > > >> >> > > On Tue, May 9, 2017 at 12:19 AM, Becket Qin < > becket....@gmail.com> > >> >> > wrote: > >> >> > >> > >> >> > >> q10. With event loop based controller design, it seems natrural > to > >> >> have > >> >> > >> the > >> >> > >> processing time for each controller event type. In that case, > the > >> >> > current > >> >> > >> metrics seem not covering all the event processing time? e.g. > >> (broker > >> >> > >> joining and broker failure event handling time). > >> >> > >> > >> >> > > > >> >> > > I'll leave it to Jun to explain why some metrics were not > included > >> in > >> >> the > >> >> > > proposal. > >> >> > > > >> >> > > q11. There are actually a couple of stages for controller state > >> >> > transition > >> >> > >> and propagation. More specifically: > >> >> > >> Stage 1: Event queue time > >> >> > >> Stage 2: Event process time (including all the zk path updates) > >> >> > >> Stage 3: State propagation time (including the state propagation > >> >> queuing > >> >> > >> time on the controller and the actual request sent to response > >> >> receive > >> >> > >> time) > >> >> > >> > >> >> > >> I think it worth to have metrics for each of the stage. Stage 3 > >> might > >> >> > be a > >> >> > >> little tricky because we may need to potentially manage the per > >> >> broker > >> >> > >> propagation time. Arguably the state propagation time is a > little > >> >> > >> overlapping with the broker side request handling time, but for > >> some > >> >> > state > >> >> > >> change that involves multiple types of requests, it could still > be > >> >> > useful > >> >> > >> to know what is the time for a state transition to be propagated > >> to > >> >> the > >> >> > >> entire cluster. > >> >> > >> > >> >> > > > >> >> > > Can you please elaborate on how you would like to see this > >> measured. > >> >> Do > >> >> > > you mean that each event would have a separate metric for each of > >> >> these > >> >> > > stages? Maybe a worked out example would make it clear. > >> >> > > > >> >> > > q11. Regarding (13), controller actually do not update the ISR > but > >> >> just > >> >> > >> read it. The error message seems from the brokers. It usually > >> >> happens in > >> >> > >> the following case: > >> >> > >> 1. Current leader broker has the cached ZNode version V > >> >> > >> 2. The controller elected a new leader and updated the ZNode, > now > >> >> ZNode > >> >> > >> version becomes V+1, > >> >> > >> 3. Controller sends the LeaderAndIsrRequest to the replica > >> brokers to > >> >> > >> propagate the new leader information as well as the new > zkVersion. > >> >> > >> 4. Before the old leader process the LeaderAndIsrRequest from > >> >> controller > >> >> > >> in > >> >> > >> step 3, The old leader tries to update ISR and found the cached > >> >> > zkVersion > >> >> > >> V > >> >> > >> is different from the actual zkVersion V+1. > >> >> > >> > >> >> > >> It looks that this is not a controller metric. > >> >> > > > >> >> > > > >> >> > > Yes, it's not a controller metric, that's why it's under the > >> >> "Partition > >> >> > > Metrics" section. In the PR, I actually implemented it in > >> >> ReplicaManager > >> >> > > alongside IsrExpandsPerSec and IsrShrinksPerSec. > >> >> > > > >> >> > > Thanks, > >> >> > > Ismael > >> >> > > > >> >> > > >> >> > >> > > >> > > >> > > > > >