Hey Viktor,

First off, thanks for the KIP! I think that it is almost always a good idea
to have more metrics. Observability never hurts.

In regards to the LogCleaner:
* Do we need to know log-cleaner-thread-count? That should always be equal
to "log.cleaner.threads" if I'm not mistaken.
* log-cleaner-current-live-thread-rate -  We already have the
"time-since-last-run-ms" metric which can let you know if something is
wrong with the log cleaning
As you said, we would like to have these two new metrics in order to
understand when a partial failure has happened - e.g only 1/3 log cleaner
threads are alive. I'm wondering if it may make more sense to either:
a) restart the threads when they die
b) add a metric which shows the dead thread count. You should probably
always have a low-level alert in the case that any threads have died

We had discussed a similar topic about thread revival and metrics in
KIP-346. Have you had a chance to look over that discussion? Here is the
mailing discussion for that -
http://mail-archives.apache.org/mod_mbox/kafka-dev/201807.mbox/%3ccanzzngyr_22go9swl67hedcm90xhvpyfzy_tezhz1mrizqk...@mail.gmail.com%3E

Best,
Stanislav



On Fri, Feb 22, 2019 at 11:18 AM Viktor Somogyi-Vass <
viktorsomo...@gmail.com> wrote:

> Hi All,
>
> I'd like to start a discussion about exposing count gauge metrics for the
> replica fetcher and log cleaner thread counts. It isn't a long KIP and the
> motivation is very simple: monitoring the thread counts in these cases
> would help with the investigation of various issues and might help in
> preventing more serious issues when a broker is in a bad state. Such a
> scenario that we seen with users is that their disk fills up as the log
> cleaner died for some reason and couldn't recover (like log corruption). In
> this case an early warning would help in the root cause analysis process as
> well as enable detecting and resolving the problem early on.
>
> The KIP is here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-434%3A+Add+Replica+Fetcher+and+Log+Cleaner+Count+Metrics
>
> I'd be happy to receive any feedback on this.
>
> Regards,
> Viktor
>


-- 
Best,
Stanislav

Reply via email to