Hello Matthias,

Thanks for the feedback. I was on vacation for a while. Pardon for the
late replies. Please see them inline below

On Thu, Dec 1, 2022 at 11:23 PM Matthias J. Sax <mj...@apache.org> wrote:
>
> Seems I am late to the party... Great KIP. Couple of questions from my side:
>
> (1) What is the purpose of `standby-updating-tasks`? It seems to be the
> same as the number of assigned standby task? Not sure how useful it
> would be?
>
In general, yes, it is the number of assigned standby tasks --- there
will be transit times when the assigned standby tasks are not yet
being updated but it would not last long --- but we do not yet have a
direct gauge to expose this before, and users have to infer this from
other indirect metrics.

>
>
> (2) `active-paused-tasks` / `standby-paused-tasks` -- what does "paused"
> exactly mean? There was a discussion about renaming the callback method
> from pause to suspended. So should this be called `suspended`, too? And
> if yes, how is it useful for users?
>
Pausing here refers to "KIP-834: Pause / Resume KafkaStreams
Topologies" 
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832).
When a topology is paused, all its tasks including standbys will be
paused too.

I'm not aware of a discussion to rename the call name to "suspend" for
KIP-834. Could you point me to the reference?

>
>
> (3) `restore-ratio`: the description says
>
> > The fraction of time the thread spent on restoring active or standby tasks
>
> I find the term "restoring" does only apply to active tasks, but not to
> standbys. Can we reword this?
>
Yeah I have been discussing this with others in the community a bit as
well, but so far I have not been convinced of a better name than it.
Some other alternatives being discussed but not win everyone's love is
"restore-or-update-ratio", "process-ratio" (for the restore thread
that means restoring or updating), and "io-ratio".

The only one so far that I feel is probably better, is
"state-update-ratio". If folks feel this one is better than
"restore-ratio" I'm happy to update.

>
> (4) `restore-call-rate`: not sure what you exactly mean by "restore calls"?
>
This is similar to the "io-calls-rate" in the selector classes, i.e.
the number of "restore" function calls made. It's argurably a very
low-level metrics but I included it since it could be useful in some
debugging scenarios.

>
> (5) `restore-remaining-records-total` -- why is this a task metric?
> Seems we could roll it up into a thread metric that we report at INFO
> level (we could still have per-task DEBUG level metric for it in addition).
>
The rationale behind it is the general principle in metrics design
that "Kafka would provide the lowest necessary metrics levels, and
users can do the roll-ups however they want".

>
> (6) What about "warmup tasks"? Internally, we treat them as standbys,
> but it seems it's hard for users to reason about it in the scale-out
> warm-up case. Would it be helpful (and possible) to report "warmup
> progress" explicitly?
>
At the restore thread level, we cannot differentiate standby tasks
from warmup tasks since the latter is created exactly just like the
former. But I do agree this is an issue for visibility that worth
addressing, I think another KIP would be needed to first consider
distinguishing these two at the class level.

>
> -Matthias
>
>
> On 11/1/22 2:44 AM, Lucas Brutschy wrote:
> > We need this!
> >
> > + 1 non binding
> >
> > Cheers,
> > Lucas
> >
> > On Tue, Nov 1, 2022 at 10:01 AM Bruno Cadonna <cado...@apache.org> wrote:
> >>
> >> Guozhang,
> >>
> >> Thanks for the KIP!
> >>
> >> +1 (binding)
> >>
> >> Best,
> >> Bruno
> >>
> >> On 25.10.22 22:07, Walker Carlson wrote:
> >>> +1 non binding
> >>>
> >>> Thanks for the kip!
> >>>
> >>> On Thu, Oct 20, 2022 at 10:25 PM John Roesler <vvcep...@apache.org> wrote:
> >>>
> >>>> Thanks for the KIP, Guozhang!
> >>>>
> >>>> I'm +1 (binding)
> >>>>
> >>>> -John
> >>>>
> >>>> On Wed, Oct 12, 2022, at 16:36, Nick Telford wrote:
> >>>>> Can't wait!
> >>>>> +1 (non-binding)
> >>>>>
> >>>>> On Wed, 12 Oct 2022, 18:02 Guozhang Wang, <guozhang.wang...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hello all,
> >>>>>>
> >>>>>> I'd like to start a vote for the following KIP, aiming to improve Kafka
> >>>>>> Stream's restoration visibility via new metrics and callback methods:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility
> >>>>>>
> >>>>>>
> >>>>>> Thanks!
> >>>>>> -- Guozhang
> >>>>>>
> >>>>
> >>>

Reply via email to