Thanks for the KIP, Dong. The general idea is good. In fact, two of the
three metrics had been listed under future work for KIP-143:

"KAFKA-5028 introduced a queue for Controller events. It would be useful to
have a gauge for the queue size and a histogram for how long an event waits
in the queue before being processed. However, we are in the process of
making additional changes to improve the handling of soft failures and
there's a possibility that the controller queue could be replaced by a
broker queue for all ZK communication. We will see how that develops before
deciding which metrics should be exposed. In the meantime, the
ControllerState and other metrics should provide enough information to
issue an alert if the Controller is not healthy."

It seems like the conclusion is that we won't have a global broker queue,
but it would be good for Jun and Onur to confirm this.

One minor comment:

1. For metric 1 and 2 in the KIP, do we want the type to be
ControllerEventManager or should it be ControllerStats like many other
Controller metrics?

Ismael

On Thu, Dec 7, 2017 at 4:21 AM, Dong Lin <lindon...@gmail.com> wrote:

> Hi all,
>
> I have created KIP-237: More Controller Health Metrics
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 237%3A+More+Controller+Health+Metrics
> .
>
> The KIP proposes to add a few more metrics to help monitor Kafka Controller
> health. Feedback and suggestions are welcome!
>
> Thanks,
> Dong
>

Reply via email to