Thanks for all the very helpful discussions, I'm closing the vote with
a tally here:

+1: 7 (Nick, John, Walker, Bruno, Lucas, Matthias, Guozhang), with 5
binding votes and 2 non-binding votes.
-1: 0


Guozhang

On Wed, Jan 25, 2023 at 5:48 PM Matthias J. Sax <mj...@apache.org> wrote:
>
> Thanks!
>
> +1 (binding)
>
> -Matthias
>
> On 1/24/23 1:17 PM, Guozhang Wang wrote:
> > Hi Matthias:
> >
> > re "paused" -> "suspended": I got your point now, thanks. Just to
> > clarify the two functions are a bit different: "paused" tasks are
> > because of the topology being paused, i.e. from KIP-834; whereas
> > "suspended" tasks are when a restoring tasks are being removed before
> > it completes due to a follow-up rebalance, and this is to distinguish
> > with "onRestoreEnd", as described in KAFKA-10575. A suspended task is
> > no longer owned by the thread and hence there's no need to measure the
> > number of such tasks.
> >
> > re: "restore-ratio": that's a good point. I like it to function in the
> > same way as the "records-rate" metrics. Will update the wiki.
> >
> > re: making "restore-remaining-records-total" at INFO level: sounds
> > good to me too. I will also update the metric name a bit to be more
> > specific.
> >
> >
> >
> > On Thu, Jan 19, 2023 at 2:35 PM Guozhang Wang
> > <guozhang.wang...@gmail.com> wrote:
> >>
> >> Hello Matthias,
> >>
> >> Thanks for the feedback. I was on vacation for a while. Pardon for the
> >> late replies. Please see them inline below
> >>
> >> On Thu, Dec 1, 2022 at 11:23 PM Matthias J. Sax <mj...@apache.org> wrote:
> >>>
> >>> Seems I am late to the party... Great KIP. Couple of questions from my 
> >>> side:
> >>>
> >>> (1) What is the purpose of `standby-updating-tasks`? It seems to be the
> >>> same as the number of assigned standby task? Not sure how useful it
> >>> would be?
> >>>
> >> In general, yes, it is the number of assigned standby tasks --- there
> >> will be transit times when the assigned standby tasks are not yet
> >> being updated but it would not last long --- but we do not yet have a
> >> direct gauge to expose this before, and users have to infer this from
> >> other indirect metrics.
> >>
> >>>
> >>>
> >>> (2) `active-paused-tasks` / `standby-paused-tasks` -- what does "paused"
> >>> exactly mean? There was a discussion about renaming the callback method
> >>> from pause to suspended. So should this be called `suspended`, too? And
> >>> if yes, how is it useful for users?
> >>>
> >> Pausing here refers to "KIP-834: Pause / Resume KafkaStreams
> >> Topologies" 
> >> (https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832).
> >> When a topology is paused, all its tasks including standbys will be
> >> paused too.
> >>
> >> I'm not aware of a discussion to rename the call name to "suspend" for
> >> KIP-834. Could you point me to the reference?
> >>
> >>>
> >>>
> >>> (3) `restore-ratio`: the description says
> >>>
> >>>> The fraction of time the thread spent on restoring active or standby 
> >>>> tasks
> >>>
> >>> I find the term "restoring" does only apply to active tasks, but not to
> >>> standbys. Can we reword this?
> >>>
> >> Yeah I have been discussing this with others in the community a bit as
> >> well, but so far I have not been convinced of a better name than it.
> >> Some other alternatives being discussed but not win everyone's love is
> >> "restore-or-update-ratio", "process-ratio" (for the restore thread
> >> that means restoring or updating), and "io-ratio".
> >>
> >> The only one so far that I feel is probably better, is
> >> "state-update-ratio". If folks feel this one is better than
> >> "restore-ratio" I'm happy to update.
> >>
> >>>
> >>> (4) `restore-call-rate`: not sure what you exactly mean by "restore 
> >>> calls"?
> >>>
> >> This is similar to the "io-calls-rate" in the selector classes, i.e.
> >> the number of "restore" function calls made. It's argurably a very
> >> low-level metrics but I included it since it could be useful in some
> >> debugging scenarios.
> >>
> >>>
> >>> (5) `restore-remaining-records-total` -- why is this a task metric?
> >>> Seems we could roll it up into a thread metric that we report at INFO
> >>> level (we could still have per-task DEBUG level metric for it in 
> >>> addition).
> >>>
> >> The rationale behind it is the general principle in metrics design
> >> that "Kafka would provide the lowest necessary metrics levels, and
> >> users can do the roll-ups however they want".
> >>
> >>>
> >>> (6) What about "warmup tasks"? Internally, we treat them as standbys,
> >>> but it seems it's hard for users to reason about it in the scale-out
> >>> warm-up case. Would it be helpful (and possible) to report "warmup
> >>> progress" explicitly?
> >>>
> >> At the restore thread level, we cannot differentiate standby tasks
> >> from warmup tasks since the latter is created exactly just like the
> >> former. But I do agree this is an issue for visibility that worth
> >> addressing, I think another KIP would be needed to first consider
> >> distinguishing these two at the class level.
> >>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 11/1/22 2:44 AM, Lucas Brutschy wrote:
> >>>> We need this!
> >>>>
> >>>> + 1 non binding
> >>>>
> >>>> Cheers,
> >>>> Lucas
> >>>>
> >>>> On Tue, Nov 1, 2022 at 10:01 AM Bruno Cadonna <cado...@apache.org> wrote:
> >>>>>
> >>>>> Guozhang,
> >>>>>
> >>>>> Thanks for the KIP!
> >>>>>
> >>>>> +1 (binding)
> >>>>>
> >>>>> Best,
> >>>>> Bruno
> >>>>>
> >>>>> On 25.10.22 22:07, Walker Carlson wrote:
> >>>>>> +1 non binding
> >>>>>>
> >>>>>> Thanks for the kip!
> >>>>>>
> >>>>>> On Thu, Oct 20, 2022 at 10:25 PM John Roesler <vvcep...@apache.org> 
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for the KIP, Guozhang!
> >>>>>>>
> >>>>>>> I'm +1 (binding)
> >>>>>>>
> >>>>>>> -John
> >>>>>>>
> >>>>>>> On Wed, Oct 12, 2022, at 16:36, Nick Telford wrote:
> >>>>>>>> Can't wait!
> >>>>>>>> +1 (non-binding)
> >>>>>>>>
> >>>>>>>> On Wed, 12 Oct 2022, 18:02 Guozhang Wang, 
> >>>>>>>> <guozhang.wang...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hello all,
> >>>>>>>>>
> >>>>>>>>> I'd like to start a vote for the following KIP, aiming to improve 
> >>>>>>>>> Kafka
> >>>>>>>>> Stream's restoration visibility via new metrics and callback 
> >>>>>>>>> methods:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks!
> >>>>>>>>> -- Guozhang
> >>>>>>>>>
> >>>>>>>
> >>>>>>

Reply via email to