Hello Bruno,

Thanks for your comments!

1. Regarding the metrics group name: originally I put
"stream-state-metrics" as it's related to state store restorations, but
after a second thought I think I agree with you that this is quite
confusing and not right. About the metrics groups, right now I have two
ideas:

a) Still use the metric group name "stream-thread-metrics", but
differentiate with the processing threads on the thread id. The pros is
that we do not introduce a new group, the cons is that users who want to
separate processing from restoration/updating in the future needs to do
that on the thread id labels.
b) Introduce a new group name, for example "stream-state-updater-metrics"
and still have a thread-id label. We would be introducing a new group which
still have a thread-id, which may sound a bit weird (maybe if we do that we
could change the existing "stream-thread-metrics" into
"stream-processor-metrics").

Right now I'm leaning towards b) and maybe in the future rename
"thread-metrics" to "processor-metrics", LMK what do you think.

2. Regarding the metric names: today we may also pause a standby tasks, and
hence I'm trying to differentiate updating standbys from paused standbys.
Right now I'm thinking we can change "restoring-standby-tasks" to
"updating-standby-tasks", and change all other metrics' "restore" (except
the "restoring-active-tasks") to "state-update", a.k.a
"state-update-ratio", "state-update-records-total",
"updating-standby-tasks" etc.

3. Regarding the function name: yeah I think that's a valid concern, I can
change it to "onRestoreSuspended" since "Aborted" may confuse people that
previously called "onBatchRestored" are undone as part of the abortion.


Guozhang



On Mon, Sep 19, 2022 at 10:47 AM Bruno Cadonna <cado...@apache.org> wrote:

> Hi Guozhang,
>
> Thanks for the KIP! I think this KIP is a really nice addition to better
> understand what is going on in a Kafka Streams application.
>
> 1.
> The metric names "paused-active-tasks" and "paused-standby-tasks" might
> be a bit confusing since at least active tasks can be paused also
> outside of restoration.
>
> 2.
> Why is the type of the metrics "stream-state-metrics"? I would have
> expected "stream-thread-metrics" as the type.
>
> 3.
> Isn't the value of the metric "restoring-standby-tasks" simply the
> number of standby tasks since standby tasks are basically always
> updating (aka restoring)?
>
> 4.
> "idle-ratio", "restore-ratio", and "checkpoint-ratio" seem metrics
> tailored to the upcoming state updater. They do not make much sense with
> a stream thread. Would it be better to introduce a new metrics level
> specifically for the state updater?
>
> 5.
> Personally, I do not like to use the word "restoration" together with
> standbys since restoration somehow implies that there is an offset for
> which the active task is considered restored and active processing can
> start. In other words, restoration is finite. Standby tasks rather
> update continuously their states. They can be up-to-date or lagging. I
> see that you could say "restored" instead of "up-to-date" and "not
> restored" instead of "lagging", but IMO it does not describe well the
> situation. That is a rather minor point. I just wanted to mention it.
>
> 6.
> The name "onRestorePaused()" might be confusing since in Kafka Streams
> users can also pause tasks. What about "onRestoreAborted()" or
> "onRestoreSuspended"?
>
> Best,
> Bruno
>
>
> On 16.09.22 19:33, Guozhang Wang wrote:
> > Hello everyone,
> >
> > I'd like to start a discussion for the following KIP, aiming to improve
> > Kafka Stream's restoration visibility via new metrics and callback
> methods:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility
> >
> >
> > Thanks!
> > -- Guozhang
> >
>


-- 
-- Guozhang

Reply via email to