Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

Sagar Thu, 10 Feb 2022 17:59:31 -0800

Hi All,

As discussed above, this KIP would be discarded and the new metric proposed
here would be added to KIP-770 as the need to add a new metric was
discovered when working on it.


Thanks!
Sagar.

On Thu, Feb 10, 2022 at 9:54 AM Sagar <[email protected]> wrote:

> Hi Guozhang,
>
> Sure. I will add it to the KIP.
>
> Thanks!
> Sagar.
>
> On Mon, Feb 7, 2022 at 6:22 AM Guozhang Wang <[email protected]> wrote:
>
>> Since the PR is reopened and we are going to re-merged the fixed PRs, what
>> about just adding that as part of the KIP as the addendum?
>>
>> On Fri, Feb 4, 2022 at 2:13 AM Sagar <[email protected]> wrote:
>>
>> > Thanks Sophie/Guozhang.
>> >
>> > Yeah I could have amended the KIP but it slipped my mind when Guozhang
>> > proposed this in the PR. Later on, the PR was merged and KIP was marked
>> as
>> > adopted so I thought I will write a new one. I know the PR had been
>> > reopened now :p . I dont have much preference on a new KIP v/s the
>> original
>> > one so anything is ok with me as well.
>> >
>> > I agree with the INFO part. I will make that change.
>> >
>> > Regarding task level, from my understanding, since every task's
>> > buffer/cache size might be different so if a certain task might be
>> > overshooting the limits then the task level metric might help people to
>> > infer this. Also, thanks for the explanation Guozhang on why this
>> should be
>> > a task level metric. What are your thoughts on this @Sophie?
>> >
>> > Thanks!
>> > Sagar.
>> >
>> >
>> > On Fri, Feb 4, 2022 at 4:47 AM Guozhang Wang <[email protected]>
>> wrote:
>> >
>> > > Thanks Sagar for proposing the KIP, and Sophie for sharing your
>> thoughts.
>> > > Here're my 2c:
>> > >
>> > > I think I agree with Sophie for making the two metrics (both the added
>> > and
>> > > the newly proposed) on INFO level since we are always calculating them
>> > > anyways. Regarding the level of the cache-size though, I'm thinking a
>> bit
>> > > different with you two: today we do not actually keep that caches on
>> the
>> > > per-store level, but rather on the per-thread level, i.e. when the
>> cache
>> > is
>> > > full we would flush not only on the triggering state store but also
>> > > potentially on other state stores as well of the task that thread
>> owns.
>> > > This mechanism, in hindsight, is a bit weird and we have some
>> discussions
>> > > about refactoring that in the future already. Personally I'd like to
>> make
>> > > this new metric to be aligned with whatever our future design will be.
>> > >
>> > > In the long run if we would not have a static assignment from tasks to
>> > > threads, it may not make sense to keep a dedicated cache pool per
>> thread.
>> > > Instead all tasks will be dynamically sharing the globally configured
>> max
>> > > cache size (dynamically here means, we would not just divide the total
>> > size
>> > > by the num.tasks and then assign that to each task), and when a cache
>> put
>> > > triggers the flushing because the sum now exceeds the global
>> configured
>> > > value, we would potentially flush all the cached records for that
>> task.
>> > If
>> > > this is the end stage, then I think keeping this metric at the task
>> level
>> > > is good.
>> > >
>> > >
>> > >
>> > > Guozhang
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman
>> > > <[email protected]> wrote:
>> > >
>> > > > Hey Sagar,  thanks for the KIP!
>> > > >
>> > > > And yes, all metrics are considered part of the public API and thus
>> > > require
>> > > > a KIP to add (or modify, etc...) Although in this particular case,
>> you
>> > > > could probably make a good case for just considering it as an
>> update to
>> > > the
>> > > > original KIP which added the analogous metric
>> > `input-buffer-bytes-total`.
>> > > > For  things like this that weren't considered during the KIP
>> proposal
>> > but
>> > > > came up during the implementation or review, and are small changes
>> that
>> > > > would have made sense to include in that KIP had they been thought
>> of,
>> > > you
>> > > > can just send an update to the existing KIP's discussion and.or
>> voting
>> > > > thread that explains what you want to add or modify and maybe a
>> brief
>> > > > description why.
>> > > >
>> > > > It's always ok to make a new KIP when in doubt, but there are some
>> > cases
>> > > > where an update email is sufficient. If there are any concerns or
>> > > > suggestions that significantly expand the scope of the update, you
>> can
>> > > > always go create a new KIP and move the discussion there.
>> > > >
>> > > > I'd say you can feel free to proceed in whichever way you'd prefer
>> for
>> > > this
>> > > > new proposal -- it just needs to appear in some KIP somewhere, and
>> have
>> > > > given the community thew opportunity to discuss and provide feedback
>> > on.
>> > > >
>> > > > On that note, I do have two suggestions:
>> > > >
>> > > > 1)  since we need to measure the size of the cache (and the input
>> > > buffer(s)
>> > > > anyways, we may as well make `cache-size-bytes-total` -- and also
>> the
>> > new
>> > > > input-buffer-bytes-total -- an INFO level metric. In general the
>> more
>> > > > metrics the merrier, the only real reason for disabling some are if
>> > they
>> > > > have a performance impact or other cost that not everyone will want
>> to
>> > > pay.
>> > > > In this case we're already computing the value of these metrics, so
>> why
>> > > not
>> > > > expose it to the user as an INFO metric
>> > > > 2) I think it would be both more natural and easier to implement if
>> > this
>> > > > was a store-level metric. A single task could in theory contain
>> > multiple
>> > > > physical state store caches and we would have to roll them up to
>> report
>> > > the
>> > > > size for the task as a whole. It's additional work just to lose some
>> > > > information that the user may want to have
>> > > >
>> > > > Let me know if anything here doesn't make sense or needs
>> clarification.
>> > > And
>> > > > thanks for the quick followup to get this 2nd metric!
>> > > >
>> > > > -Sophie
>> > > >
>> > > > On Sat, Jan 29, 2022 at 4:27 AM Sagar <[email protected]>
>> > wrote:
>> > > >
>> > > > > Hi All,
>> > > > >
>> > > > > I would like to open a discussion thread on the following KIP:
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
>> > > > >
>> > > > > PS: This is about introducing a new metric and I am assuming that
>> it
>> > > > > requires a KIP. If that isn't the case, I can close it.
>> > > > >
>> > > > > Thanks!
>> > > > > Sagar.
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > -- Guozhang
>> > >
>> >
>>
>>
>> --
>> -- Guozhang
>>
>

Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

Reply via email to