Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

Guozhang Wang Thu, 03 Feb 2022 15:17:32 -0800

Thanks Sagar for proposing the KIP, and Sophie for sharing your thoughts.
Here're my 2c:

I think I agree with Sophie for making the two metrics (both the added and
the newly proposed) on INFO level since we are always calculating them
anyways. Regarding the level of the cache-size though, I'm thinking a bit
different with you two: today we do not actually keep that caches on the
per-store level, but rather on the per-thread level, i.e. when the cache is
full we would flush not only on the triggering state store but also
potentially on other state stores as well of the task that thread owns.
This mechanism, in hindsight, is a bit weird and we have some discussions
about refactoring that in the future already. Personally I'd like to make
this new metric to be aligned with whatever our future design will be.

In the long run if we would not have a static assignment from tasks to
threads, it may not make sense to keep a dedicated cache pool per thread.
Instead all tasks will be dynamically sharing the globally configured max
cache size (dynamically here means, we would not just divide the total size
by the num.tasks and then assign that to each task), and when a cache put
triggers the flushing because the sum now exceeds the global configured
value, we would potentially flush all the cached records for that task. If
this is the end stage, then I think keeping this metric at the task level
is good.

Guozhang

On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman
<[email protected]> wrote:

> Hey Sagar,  thanks for the KIP!
>
> And yes, all metrics are considered part of the public API and thus require
> a KIP to add (or modify, etc...) Although in this particular case, you
> could probably make a good case for just considering it as an update to the
> original KIP which added the analogous metric `input-buffer-bytes-total`.
> For  things like this that weren't considered during the KIP proposal but
> came up during the implementation or review, and are small changes that
> would have made sense to include in that KIP had they been thought of, you
> can just send an update to the existing KIP's discussion and.or voting
> thread that explains what you want to add or modify and maybe a brief
> description why.
>
> It's always ok to make a new KIP when in doubt, but there are some cases
> where an update email is sufficient. If there are any concerns or
> suggestions that significantly expand the scope of the update, you can
> always go create a new KIP and move the discussion there.
>
> I'd say you can feel free to proceed in whichever way you'd prefer for this
> new proposal -- it just needs to appear in some KIP somewhere, and have
> given the community thew opportunity to discuss and provide feedback on.
>
> On that note, I do have two suggestions:
>
> 1)  since we need to measure the size of the cache (and the input buffer(s)
> anyways, we may as well make `cache-size-bytes-total` -- and also the new
> input-buffer-bytes-total -- an INFO level metric. In general the more
> metrics the merrier, the only real reason for disabling some are if they
> have a performance impact or other cost that not everyone will want to pay.
> In this case we're already computing the value of these metrics, so why not
> expose it to the user as an INFO metric
> 2) I think it would be both more natural and easier to implement if this
> was a store-level metric. A single task could in theory contain multiple
> physical state store caches and we would have to roll them up to report the
> size for the task as a whole. It's additional work just to lose some
> information that the user may want to have
>
> Let me know if anything here doesn't make sense or needs clarification. And
> thanks for the quick followup to get this 2nd metric!
>
> -Sophie
>
> On Sat, Jan 29, 2022 at 4:27 AM Sagar <[email protected]> wrote:
>
> > Hi All,
> >
> > I would like to open a discussion thread on the following KIP:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
> >
> > PS: This is about introducing a new metric and I am assuming that it
> > requires a KIP. If that isn't the case, I can close it.
> >
> > Thanks!
> > Sagar.
> >
>

-- 
-- Guozhang

Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

Reply via email to