Actually, one other point: our existing state store operation metrics are
measured in nanoseconds[1].

Should iterator-duration-(avg|max) also be measured in nanoseconds, for
consistency, or should we keep them milliseconds, as the KIP currently
states?

1:
https://docs.confluent.io/platform/current/streams/monitoring.html#state-store-metrics

On Thu, 16 May 2024 at 12:15, Nick Telford <nick.telf...@gmail.com> wrote:

> Good point! I've updated it to "Improved StateStore Iterator metrics for
> detecting leaks" - let me know if you have a better suggestion.
>
> This should affect the voting imo, as nothing of substance has changed.
>
> Regards,
> Nick
>
> On Thu, 16 May 2024 at 01:39, Sophie Blee-Goldman <sop...@responsive.dev>
> wrote:
>
>> One quick thing -- can you update the title of this KIP to reflect the
>> decision to implement these metrics for all state stores implementations
>> rather than just RocksDB?
>>
>>
>> On Tue, May 14, 2024 at 1:36 PM Nick Telford <nick.telf...@gmail.com>
>> wrote:
>>
>> > Woops! Thanks for the catch Lucas. Given this was just a typo, I don't
>> > think this affects the voting.
>> >
>> > Cheers,
>> > Nick
>> >
>> > On Tue, 14 May 2024 at 18:06, Lucas Brutschy <lbruts...@confluent.io
>> > .invalid>
>> > wrote:
>> >
>> > > Hi Nick,
>> > >
>> > > you are still referring to oldest-open-iterator-age-ms in the
>> > > `Proposed Changes` section.
>> > >
>> > > Cheers,
>> > > Lucas
>> > >
>> > > On Thu, May 2, 2024 at 4:00 PM Lucas Brutschy <lbruts...@confluent.io
>> >
>> > > wrote:
>> > > >
>> > > > Hi Nick!
>> > > >
>> > > > I agree, the age variant is a bit nicer since the semantics are very
>> > > > clear from the name. If you'd rather go for the simple
>> implementation,
>> > > > how about calling it `oldest-iterator-open-since-ms`? I believe this
>> > > > could be understood without docs. Either way, I think we should be
>> > > > able to open the vote for this KIP because nobody raised any major /
>> > > > blocking concerns.
>> > > >
>> > > > Looking forward to getting this voted on soon!
>> > > >
>> > > > Cheers
>> > > > Lucas
>> > > >
>> > > > On Sun, Mar 31, 2024 at 5:23 PM Nick Telford <
>> nick.telf...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > Hi Matthias,
>> > > > >
>> > > > > > For the oldest iterator metric, I would propose something simple
>> > like
>> > > > > > `iterator-opened-ms` and it would just be the actual timestamp
>> when
>> > > the
>> > > > > > iterator was opened. I don't think we need to compute the actual
>> > age,
>> > > > > > but user can to this computation themselves?
>> > > > >
>> > > > > That works for me; it's easier to implement like that :-D I'm a
>> > little
>> > > > > concerned that the name "iterator-opened-ms" may not be obvious
>> > enough
>> > > > > without reading the docs.
>> > > > >
>> > > > > > If we think reporting the age instead of just the timestamp is
>> > > better, I
>> > > > > > would propose `iterator-max-age-ms`. I should be sufficient to
>> call
>> > > out
>> > > > > > (as it's kinda "obvious" anyway) that the metric applies to open
>> > > > > > iterator only.
>> > > > >
>> > > > > While I think it's preferable to record the timestamp, rather than
>> > the
>> > > age,
>> > > > > this does have the benefit of a more obvious metric name.
>> > > > >
>> > > > > > Nit: the KIP says it's a store-level metric, but I think it
>> would
>> > be
>> > > > > > good to say explicitly that it's recorded with DEBUG level only?
>> > > > >
>> > > > > Yes, I've already updated the KIP with this information in the
>> table.
>> > > > >
>> > > > > Regards,
>> > > > >
>> > > > > Nick
>> > > > >
>> > > > > On Sun, 31 Mar 2024 at 10:53, Matthias J. Sax <mj...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > > The time window thing was just an idea. Happy to drop it.
>> > > > > >
>> > > > > > For the oldest iterator metric, I would propose something simple
>> > like
>> > > > > > `iterator-opened-ms` and it would just be the actual timestamp
>> when
>> > > the
>> > > > > > iterator was opened. I don't think we need to compute the actual
>> > age,
>> > > > > > but user can to this computation themselves?
>> > > > > >
>> > > > > > If we think reporting the age instead of just the timestamp is
>> > > better, I
>> > > > > > would propose `iterator-max-age-ms`. I should be sufficient to
>> call
>> > > out
>> > > > > > (as it's kinda "obvious" anyway) that the metric applies to open
>> > > > > > iterator only.
>> > > > > >
>> > > > > > And yes, I was hoping that the code inside MetereXxxStore might
>> > > already
>> > > > > > be setup in a way that custom stores would inherit the iterator
>> > > metrics
>> > > > > > automatically -- I am just not sure, and left it as an exercise
>> for
>> > > > > > somebody to confirm :)
>> > > > > >
>> > > > > >
>> > > > > > Nit: the KIP says it's a store-level metric, but I think it
>> would
>> > be
>> > > > > > good to say explicitly that it's recorded with DEBUG level only?
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > -Matthias
>> > > > > >
>> > > > > >
>> > > > > > On 3/28/24 2:52 PM, Nick Telford wrote:
>> > > > > > > Quick addendum:
>> > > > > > >
>> > > > > > > My suggested metric "oldest-open-iterator-age-seconds" should
>> be
>> > > > > > > "oldest-open-iterator-age-ms". Milliseconds is obviously a
>> better
>> > > > > > > granularity for such a metric.
>> > > > > > >
>> > > > > > > Still accepting suggestions for a better name.
>> > > > > > >
>> > > > > > > On Thu, 28 Mar 2024 at 13:41, Nick Telford <
>> > nick.telf...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > >> Hi everyone,
>> > > > > > >>
>> > > > > > >> Sorry for leaving this for so long. So much for "3 weeks
>> until
>> > KIP
>> > > > > > freeze"!
>> > > > > > >>
>> > > > > > >> On Sophie's comments:
>> > > > > > >> 1. Would Matthias's suggestion of a separate metric tracking
>> the
>> > > age of
>> > > > > > >> the oldest open iterator (within the tag set) satisfy this?
>> That
>> > > way we
>> > > > > > can
>> > > > > > >> keep iterator-duration-(avg|max) for closed iterators, which
>> can
>> > > be
>> > > > > > useful
>> > > > > > >> for performance debugging for iterators that don't leak. I'm
>> not
>> > > sure
>> > > > > > what
>> > > > > > >> we'd call this metric, maybe:
>> > "oldest-open-iterator-age-seconds"?
>> > > Seems
>> > > > > > >> like a mouthful.
>> > > > > > >>
>> > > > > > >> 2. You're right, it makes more sense to provide
>> > > > > > >> iterator-duration-(avg|max). Honestly, I can't remember why I
>> > had
>> > > > > > "total"
>> > > > > > >> before, or why I was computing a rate-of-change over it.
>> > > > > > >>
>> > > > > > >> 3, 4, 5, 6. Agreed, I'll make all those changes as suggested.
>> > > > > > >>
>> > > > > > >> 7. Combined with Matthias's point about RocksDB, I'm
>> convinced
>> > > that this
>> > > > > > >> is the wrong KIP for these. I'll introduce the additional
>> Rocks
>> > > metrics
>> > > > > > in
>> > > > > > >> another KIP.
>> > > > > > >>
>> > > > > > >> On Matthias's comments:
>> > > > > > >> A. Not sure about the time window. I'm pretty sure all
>> existing
>> > > avg/max
>> > > > > > >> metrics are since the application was started? Any other
>> > > suggestions
>> > > > > > here
>> > > > > > >> would be appreciated.
>> > > > > > >>
>> > > > > > >> B. Agreed. See point 1 above.
>> > > > > > >>
>> > > > > > >> C. Good point. My focus was very much on Rocks memory leaks
>> when
>> > > I wrote
>> > > > > > >> the first draft. I can generalise it. My only concern is
>> that it
>> > > might
>> > > > > > make
>> > > > > > >> it more difficult to detect Rocks iterator leaks caused
>> *within*
>> > > our
>> > > > > > >> high-level iterator, e.g. RocksJNI, RocksDB, RocksDBStore,
>> etc.
>> > > But we
>> > > > > > >> could always provide a RocksDB-specific metric for this, as
>> you
>> > > > > > suggested.
>> > > > > > >>
>> > > > > > >> D. Hmm, we do already have MeteredKeyValueIterator, which
>> > > automatically
>> > > > > > >> wraps the iterator from inner-stores of
>> MeteredKeyValueStore. If
>> > > we
>> > > > > > >> implemented these metrics there, then custom stores would
>> > > automatically
>> > > > > > >> gain the functionality, right? This seems like a pretty
>> logical
>> > > place to
>> > > > > > >> implement these metrics, since MeteredKeyValueStore is all
>> about
>> > > adding
>> > > > > > >> metrics to state stores.
>> > > > > > >>
>> > > > > > >>> I imagine the best way to implement this would be to do so
>> at
>> > the
>> > > > > > >>> high-level iterator rather than implementing it separately
>> for
>> > > each
>> > > > > > >>> specific iterator implementation for every store type.
>> > > > > > >>
>> > > > > > >> Sophie, does MeteredKeyValueIterator fit with your
>> > recommendation?
>> > > > > > >>
>> > > > > > >> Thanks for your thoughts everyone, I'll update the KIP now.
>> > > > > > >>
>> > > > > > >> Nick
>> > > > > > >>
>> > > > > > >> On Thu, 14 Mar 2024 at 03:37, Sophie Blee-Goldman <
>> > > > > > sop...@responsive.dev>
>> > > > > > >> wrote:
>> > > > > > >>
>> > > > > > >>> About your last two points: I completely agree that we
>> should
>> > > try to
>> > > > > > >>> make this independent of RocksDB, and should probably adopt
>> a
>> > > > > > >>> general philosophy of being store-implementation agnostic
>> > unless
>> > > > > > >>> there is good reason to focus on a particular store type:
>> eg if
>> > > it was
>> > > > > > >>> only possible to implement for certain stores, or only made
>> > > sense in
>> > > > > > >>> the context of a certain store type but not necessarily
>> stores
>> > in
>> > > > > > general.
>> > > > > > >>>
>> > > > > > >>> While leaking memory due to unclosed iterators on RocksDB
>> > stores
>> > > is
>> > > > > > >>> certainly the most common issue, I think Matthias
>> sufficiently
>> > > > > > >>> demonstrated that the problem of leaking iterators is not
>> > > actually
>> > > > > > >>> unique to RocksDB, and we should consider including
>> in-memory
>> > > > > > >>> stores at the very least. I also think that at this point,
>> we
>> > > may as
>> > > > > > well
>> > > > > > >>> just implement the metrics for *all* store types, whether
>> > > rocksdb or
>> > > > > > >>> in-memory or custom. Not just because it probably applies to
>> > all
>> > > > > > >>> store types (leaking iterators are rarely a good thing!) but
>> > > because
>> > > > > > >>> I imagine the best way to implement this would be to do so
>> at
>> > the
>> > > > > > >>> high-level iterator rather than implementing it separately
>> for
>> > > each
>> > > > > > >>> specific iterator implementation for every store type.
>> > > > > > >>>
>> > > > > > >>> That said, I haven't thought all that carefully about the
>> > > > > > implementation
>> > > > > > >>> yet -- it just strikes me as easiest to do at the top level
>> of
>> > > the
>> > > > > > store
>> > > > > > >>> hierarchy rather than at the bottom. My gut instinct may
>> very
>> > > well be
>> > > > > > >>> wrong, but that's what it's saying
>> > > > > > >>>
>> > > > > > >>> On Thu, Mar 7, 2024 at 10:43 AM Matthias J. Sax <
>> > > mj...@apache.org>
>> > > > > > wrote:
>> > > > > > >>>
>> > > > > > >>>> Seems I am late to this party. Can we pick this up again
>> > aiming
>> > > for
>> > > > > > 3.8
>> > > > > > >>>> release? I think it would be a great addition. Few
>> comments:
>> > > > > > >>>>
>> > > > > > >>>>
>> > > > > > >>>> - I think it does make sense to report
>> `iterator-duration-avg`
>> > > and
>> > > > > > >>>> `iterator-duration-max` for all *closed* iterators -- it
>> just
>> > > seems to
>> > > > > > >>>> be a useful metric (wondering if this would be _overall_ or
>> > > bounded to
>> > > > > > >>>> some time window?)
>> > > > > > >>>>
>> > > > > > >>>> - About the duration iterators are currently open, I
>> believe
>> > > the only
>> > > > > > >>>> useful way is to report the "oldest iterator", ie, the
>> > smallest
>> > > > > > iterator
>> > > > > > >>>> open-time, of all currently open-iterator? We all agree
>> that
>> > in
>> > > > > > general,
>> > > > > > >>>> leaking iterator would bump the count metric, and if there
>> is
>> > a
>> > > few
>> > > > > > ones
>> > > > > > >>>> which are not closed and open for a long time, it seem
>> > > sufficient to
>> > > > > > >>>> detect the single oldest one for alerting purpose?
>> > > > > > >>>>
>> > > > > > >>>> - What I don't like about the KIP is it focus on RocksDB. I
>> > > don't
>> > > > > > think
>> > > > > > >>>> we should build on the internal RocksDB counters as
>> proposed
>> > (I
>> > > guess,
>> > > > > > >>>> we could still expose them, similar to other RocksDB
>> metrics
>> > > which we
>> > > > > > >>>> expose already). However, for this new metric, we should
>> track
>> > > it
>> > > > > > >>>> ourselves and thus make it independent of RocksDB -- in the
>> > > end, an
>> > > > > > >>>> in-memory store could also leak memory (and kill a JVM
>> with an
>> > > > > > >>>> out-of-memory error) and we should be able to track it.
>> > > > > > >>>>
>> > > > > > >>>> - Not sure if we would like to add support for custom
>> stores,
>> > > to allow
>> > > > > > >>>> them to register their iterators with this metric? Or would
>> > > this not
>> > > > > > be
>> > > > > > >>>> necessary, because custom stores could just register a
>> custom
>> > > metric
>> > > > > > >>>> about it to begin with?
>> > > > > > >>>>
>> > > > > > >>>>
>> > > > > > >>>>
>> > > > > > >>>> -Matthias
>> > > > > > >>>>
>> > > > > > >>>> On 10/25/23 4:41 PM, Sophie Blee-Goldman wrote:
>> > > > > > >>>>>>
>> > > > > > >>>>>>    If we used "iterator-duration-max", for
>> > > > > > >>>>>> example, would it not be confusing that it includes
>> > Iterators
>> > > that
>> > > > > > >>> are
>> > > > > > >>>>>> still open, and therefore the duration is not yet known?
>> > > > > > >>>>>
>> > > > > > >>>>>
>> > > > > > >>>>> 1. Ah, I think I understand your concern better now -- I
>> > > totally
>> > > > > > agree
>> > > > > > >>>> that
>> > > > > > >>>>> a
>> > > > > > >>>>>    "iterator-duration-max" metric would be
>> > > confusing/misleading. I
>> > > > > > was
>> > > > > > >>>>> thinking about it a bit differently, something more akin
>> to
>> > the
>> > > > > > >>>>> "last-rebalance-seconds-ago" consumer metric. As the name
>> > > suggests,
>> > > > > > >>>>> that basically just tracks how long the consumer has gone
>> > > without
>> > > > > > >>>>> rebalancing -- it doesn't purport to represent the actual
>> > > duration
>> > > > > > >>>> between
>> > > > > > >>>>> rebalances, just the current time since the last one.  The
>> > > hard part
>> > > > > > >>> is
>> > > > > > >>>>> really
>> > > > > > >>>>> in choosing a name that reflects this -- maybe you have
>> some
>> > > better
>> > > > > > >>> ideas
>> > > > > > >>>>> but off the top of my head, perhaps something like
>> > > > > > >>>> "iterator-lifetime-max"?
>> > > > > > >>>>>
>> > > > > > >>>>> 2. I'm not quite sure how to interpret the
>> > > "iterator-duration-total"
>> > > > > > >>>> metric
>> > > > > > >>>>> -- what exactly does it mean to add up all the iterator
>> > > durations?
>> > > > > > For
>> > > > > > >>>>> some context, while this is not a hard-and-fast rule, in
>> > > general
>> > > > > > >>> you'll
>> > > > > > >>>>> find that Kafka/Streams metrics tend to come in pairs of
>> > > avg/max or
>> > > > > > >>>>> rate/total. Something that you might measure the avg for
>> > > usually is
>> > > > > > >>>>> also useful to measure the max, whereas a total metric is
>> > > probably
>> > > > > > >>>>> also useful as a rate but not so much as an avg. I
>> actually
>> > > think
>> > > > > > this
>> > > > > > >>>>> is part of why it feels like it makes so much sense to
>> > include
>> > > a
>> > > > > > "max"
>> > > > > > >>>>> version of this metric, as Lucas suggested, even if the
>> name
>> > of
>> > > > > > >>>>> "iterator-duration-max" feels misleading. Ultimately the
>> > > metric names
>> > > > > > >>>>> are up to you, but for this reason, I would personally
>> > > advocate for
>> > > > > > >>>>> just going with an "iterator-duration-avg" and
>> > > > > > "iterator-duration-max"
>> > > > > > >>>>>
>> > > > > > >>>>> I did see your example in which you mention one could
>> monitor
>> > > the
>> > > > > > >>>>> rate of change of the "-total" metric. While this does
>> make
>> > > sense to
>> > > > > > >>>>> me, if the only way to interpret a metric is by computing
>> > > another
>> > > > > > >>>>> function over it, then why not just make that computation
>> the
>> > > metric
>> > > > > > >>>>> and cut out the middle man? And in this case, to me at
>> least,
>> > > it
>> > > > > > feels
>> > > > > > >>>>> much easier to understand a metric like
>> > > "iterator-duration-max" vs
>> > > > > > >>>>> something like "iterator-duration-total-rate"
>> > > > > > >>>>>
>> > > > > > >>>>> 3. By the way, can you add another column to the table
>> with
>> > > the new
>> > > > > > >>>> metrics
>> > > > > > >>>>> that lists the recording level? My suggestion would be to
>> put
>> > > the
>> > > > > > >>>>> "number-open-iterators" at INFO and the other two at
>> DEBUG.
>> > See
>> > > > > > >>>>> the following for my reasoning behind this recommendation
>> > > > > > >>>>>
>> > > > > > >>>>> 4. I would change the "Type" entry for the
>> > > "number-open-iterators"
>> > > > > > >>> from
>> > > > > > >>>>> "Value" to "Gauge". This helps justify the "INFO" level
>> for
>> > > this
>> > > > > > >>> metric,
>> > > > > > >>>>> since unlike the other metrics which are "Measurables",
>> the
>> > > current
>> > > > > > >>>>> timestamp won't need to be retrieved on each recording
>> > > > > > >>>>>
>> > > > > > >>>>> 5. Can you list the tags that would be associated with
>> each
>> > of
>> > > these
>> > > > > > >>>>> metrics (either in the table, or separately above/below if
>> > > they will
>> > > > > > >>> be
>> > > > > > >>>>> the same for all)
>> > > > > > >>>>>
>> > > > > > >>>>> 6. Do you have a strong preference for the name
>> > > > > > >>> "number-open-iterators"
>> > > > > > >>>>> or would you be alright in shortening this to
>> > > "num-open-iterators"?
>> > > > > > >>> The
>> > > > > > >>>>> latter is more in line with the naming scheme used
>> elsewhere
>> > > in Kafka
>> > > > > > >>>>> for similar kinds of metrics, and a shorter name is always
>> > > nice.
>> > > > > > >>>>>
>> > > > > > >>>>> 7. With respect to the rocksdb cache metrics, those sound
>> > > useful but
>> > > > > > >>>>> if it was me, I would probably save them for a separate
>> KIP
>> > > mainly
>> > > > > > >>> just
>> > > > > > >>>>> because the KIP freeze deadline is in a few weeks, and I
>> > > wouldn't
>> > > > > > want
>> > > > > > >>>>> to end up blocking all the new metrics just because there
>> was
>> > > ongoing
>> > > > > > >>>>> debate about a subset of them. That said, you do have 3
>> full
>> > > weeks,
>> > > > > > so
>> > > > > > >>>>> I would hope that you could get both sets of metrics
>> agreed
>> > > upon in
>> > > > > > >>>>> that timeframe!
>> > > > > > >>>>>
>> > > > > > >>>>>
>> > > > > > >>>>> On Tue, Oct 24, 2023 at 6:35 AM Nick Telford <
>> > > nick.telf...@gmail.com
>> > > > > > >
>> > > > > > >>>> wrote:
>> > > > > > >>>>>
>> > > > > > >>>>>> I don't really have a problem with adding such a metric,
>> I'm
>> > > just
>> > > > > > not
>> > > > > > >>>>>> entirely sure how it would work. If we used
>> > > "iterator-duration-max",
>> > > > > > >>> for
>> > > > > > >>>>>> example, would it not be confusing that it includes
>> > Iterators
>> > > that
>> > > > > > >>> are
>> > > > > > >>>>>> still open, and therefore the duration is not yet known?
>> > When
>> > > > > > >>> graphing
>> > > > > > >>>> that
>> > > > > > >>>>>> over time, I suspect it would be difficult to understand.
>> > > > > > >>>>>>
>> > > > > > >>>>>> 3.
>> > > > > > >>>>>> FWIW, this would still be picked up by "open-iterators",
>> > > since that
>> > > > > > >>>> metric
>> > > > > > >>>>>> is only decremented when Iterator#close is called (via
>> the
>> > > > > > >>>>>> ManagedKeyValueIterator#onClose hook).
>> > > > > > >>>>>>
>> > > > > > >>>>>> I'm actually considering expanding the scope of this KIP
>> > > slightly to
>> > > > > > >>>>>> include improved Block Cache metrics, as my own memory
>> leak
>> > > > > > >>>> investigations
>> > > > > > >>>>>> have trended in that direction. Do you think the
>> following
>> > > metrics
>> > > > > > >>>> should
>> > > > > > >>>>>> be included in this KIP, or should I create a new KIP?
>> > > > > > >>>>>>
>> > > > > > >>>>>>      - block-cache-index-usage (number of bytes occupied
>> by
>> > > index
>> > > > > > >>> blocks)
>> > > > > > >>>>>>      - block-cache-filter-usage (number of bytes
>> occupied by
>> > > filter
>> > > > > > >>>> blocks)
>> > > > > > >>>>>>
>> > > > > > >>>>>> Regards,
>> > > > > > >>>>>> Nick
>> > > > > > >>>>>>
>> > > > > > >>>>>> On Tue, 24 Oct 2023 at 07:09, Sophie Blee-Goldman <
>> > > > > > >>>> sop...@responsive.dev>
>> > > > > > >>>>>> wrote:
>> > > > > > >>>>>>
>> > > > > > >>>>>>> I actually think we could implement Lucas' suggestion
>> > pretty
>> > > easily
>> > > > > > >>> and
>> > > > > > >>>>>>> without too much additional effort. We have full control
>> > > over the
>> > > > > > >>>>>> iterator
>> > > > > > >>>>>>> that is returned by the various range queries, so it
>> would
>> > > be easy
>> > > > > > >>> to
>> > > > > > >>>>>>> register a gauge metric for how long it has been since
>> the
>> > > iterator
>> > > > > > >>> was
>> > > > > > >>>>>>> created. Then we just deregister the metric when the
>> > > iterator is
>> > > > > > >>>> closed.
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> With respect to how useful this metric would be, both
>> Nick
>> > > and
>> > > > > > Lucas
>> > > > > > >>>> have
>> > > > > > >>>>>>> made good points: I would agree that in general, leaking
>> > > iterators
>> > > > > > >>>> would
>> > > > > > >>>>>>> mean an ever-increasing iterator count that should be
>> > > possible to
>> > > > > > >>> spot
>> > > > > > >>>>>>> without this. However, a few things to consider:
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> 1. it's really easy to set up an alert based on some
>> > maximum
>> > > > > > >>> threshold
>> > > > > > >>>> of
>> > > > > > >>>>>>> how long an iterator should remain open for. It's
>> > relatively
>> > > more
>> > > > > > >>>> tricky
>> > > > > > >>>>>> to
>> > > > > > >>>>>>> set up alerts based on the current count of open
>> iterators
>> > > and how
>> > > > > > >>> it
>> > > > > > >>>>>>> changes over time.
>> > > > > > >>>>>>> 2. As Lucas mentioned, it only takes a few iterators to
>> > > wreak havoc
>> > > > > > >>> in
>> > > > > > >>>>>>> extreme cases. Sometimes more advanced applications end
>> up
>> > > with
>> > > > > > >>> just a
>> > > > > > >>>>>> few
>> > > > > > >>>>>>> leaking iterators despite closing the majority of them.
>> > I've
>> > > seen
>> > > > > > >>> this
>> > > > > > >>>>>>> happen just once personally, but it was driving everyone
>> > > crazy
>> > > > > > >>> until we
>> > > > > > >>>>>>> figured it out.
>> > > > > > >>>>>>> 3. A metric for how long the iterator has been open
>> would
>> > > help to
>> > > > > > >>>>>> identify
>> > > > > > >>>>>>> hanging iterators due to some logic where the iterator
>> is
>> > > properly
>> > > > > > >>>> closed
>> > > > > > >>>>>>> but for whatever reason just isn't being advanced to the
>> > > end, and
>> > > > > > >>> thus
>> > > > > > >>>>>> not
>> > > > > > >>>>>>> reached the iterator#close line of the user code. This
>> case
>> > > seems
>> > > > > > >>>>>> difficult
>> > > > > > >>>>>>> to spot without the specific metric for iterator
>> lifetime
>> > > > > > >>>>>>> 4. Lastly, I think you could argue that all of the above
>> > are
>> > > fairly
>> > > > > > >>>>>>> advanced use cases, but this seems like a fairly
>> advanced
>> > > feature
>> > > > > > >>>>>> already,
>> > > > > > >>>>>>> so it doesn't seem unreasonable to try and cover all the
>> > > bases.
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> All that said, my philosophy is that the KIP author gets
>> > the
>> > > final
>> > > > > > >>> word
>> > > > > > >>>>>> on
>> > > > > > >>>>>>> what to pull into scope as long as the proposal isn't
>> > harming
>> > > > > > anyone
>> > > > > > >>>>>>> without the extra feature/changes. So it's up to you
>> Nick
>> > > --  just
>> > > > > > >>>> wanted
>> > > > > > >>>>>>> to add some context on how it could work, and why it
>> would
>> > be
>> > > > > > >>> helpful
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> Thanks for the KIP!
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> On Wed, Oct 18, 2023 at 7:04 AM Lucas Brutschy
>> > > > > > >>>>>>> <lbruts...@confluent.io.invalid> wrote:
>> > > > > > >>>>>>>
>> > > > > > >>>>>>>> Hi Nick,
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> I did not think in detail about how to implement it,
>> just
>> > > about
>> > > > > > >>> what
>> > > > > > >>>>>>>> metrics would be nice to have. You are right, we'd
>> have to
>> > > > > > >>>>>>>> register/deregister the iterators during open/close.
>> This
>> > > would be
>> > > > > > >>>>>>>> more complicated to implement than the other metrics,
>> but
>> > I
>> > > do not
>> > > > > > >>> see
>> > > > > > >>>>>>>> a fundamental problem with it.
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> As far as I understand, even a low number of leaked
>> > > iterators can
>> > > > > > >>> hurt
>> > > > > > >>>>>>>> RocksDB compaction significantly. So we may even want
>> to
>> > > detect if
>> > > > > > >>> the
>> > > > > > >>>>>>>> iterators are opened by one-time / rare queries against
>> > the
>> > > state
>> > > > > > >>>>>>>> store.
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> But, as I said, that would be an addition and not a
>> change
>> > > of the
>> > > > > > >>>>>>>> current contents of the KIP, so I'd support the KIP
>> moving
>> > > forward
>> > > > > > >>>>>>>> even without this extension.
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> Cheers, Lucas
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> On Tue, Oct 17, 2023 at 3:45 PM Nick Telford <
>> > > > > > >>> nick.telf...@gmail.com>
>> > > > > > >>>>>>>> wrote:
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> Hi Lucas,
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> Hmm, I'm not sure how we could reliably identify such
>> > > leaked
>> > > > > > >>>>>> Iterators.
>> > > > > > >>>>>>>> If
>> > > > > > >>>>>>>>> we tried to include open iterators when calculating
>> > > > > > >>>>>> iterator-duration,
>> > > > > > >>>>>>>> we'd
>> > > > > > >>>>>>>>> need some kind of registry of all the open iterator
>> > > creation
>> > > > > > >>>>>>> timestamps,
>> > > > > > >>>>>>>>> wouldn't we?
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> In general, if you have a leaky Iterator, it should
>> > > manifest as a
>> > > > > > >>>>>>>>> persistently climbing "open-iterators" metric, even
>> on a
>> > > busy
>> > > > > > >>> node,
>> > > > > > >>>>>>>> because
>> > > > > > >>>>>>>>> each time that Iterator is used, it will leak another
>> > one.
>> > > So
>> > > > > > >>> even in
>> > > > > > >>>>>>> the
>> > > > > > >>>>>>>>> presence of many non-leaky Iterators on a busy
>> instance,
>> > > the
>> > > > > > >>> metric
>> > > > > > >>>>>>>> should
>> > > > > > >>>>>>>>> still consistently climb.
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> Regards,
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> Nick
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> On Mon, 16 Oct 2023 at 14:24, Lucas Brutschy <
>> > > > > > >>> lbruts...@confluent.io
>> > > > > > >>>>>>>> .invalid>
>> > > > > > >>>>>>>>> wrote:
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>>> Hi Nick!
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> thanks for the KIP! I think this could be quite
>> useful,
>> > > given
>> > > > > > the
>> > > > > > >>>>>>>>>> problems that we had with leaks due to RocksDB
>> resources
>> > > not
>> > > > > > >>> being
>> > > > > > >>>>>>>>>> closed.
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> I don't have any pressing issues why we can't accept
>> it
>> > > like it
>> > > > > > >>> is,
>> > > > > > >>>>>>>>>> just one minor point for discussion: would it also
>> make
>> > > sense to
>> > > > > > >>>>>> make
>> > > > > > >>>>>>>>>> it possible to identify a few very long-running /
>> leaked
>> > > > > > >>>>>> iterators? I
>> > > > > > >>>>>>>>>> can imagine on a busy node, it would be hard to spot
>> > that
>> > > 1% of
>> > > > > > >>> the
>> > > > > > >>>>>>>>>> iterators never close when looking only at closed
>> > > iterator or
>> > > > > > the
>> > > > > > >>>>>>>>>> number of iterators. But it could still be good to
>> > > identify
>> > > > > > those
>> > > > > > >>>>>>>>>> leaks early. One option would be to add
>> > > `iterator-duration-max`
>> > > > > > >>> and
>> > > > > > >>>>>>>>>> take open iterators into account when computing the
>> > > metric.
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> Cheers,
>> > > > > > >>>>>>>>>> Lucas
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> On Thu, Oct 5, 2023 at 3:50 PM Nick Telford <
>> > > > > > >>>>>> nick.telf...@gmail.com>
>> > > > > > >>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> Hi Colt,
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> I kept the details out of the KIP, because KIPs
>> > generally
>> > > > > > >>>>>> document
>> > > > > > >>>>>>>>>>> high-level design, but the way I'm doing it is like
>> > this:
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>>           final ManagedKeyValueIterator<Bytes,
>> byte[]>
>> > > > > > >>>>>>>>>>> rocksDbPrefixSeekIterator = cf.prefixScan(accessor,
>> > > > > > >>> prefixBytes);
>> > > > > > >>>>>>>>>>> -->     final long startedAt = System.nanoTime();
>> > > > > > >>>>>>>>>>>
>>  openIterators.add(rocksDbPrefixSeekIterator);
>> > > > > > >>>>>>>>>>>           rocksDbPrefixSeekIterator.onClose(() -> {
>> > > > > > >>>>>>>>>>> -->
>> > > > > > >>>>>>>
>> metricsRecorder.recordIteratorDuration(System.nanoTime()
>> > > > > > >>>>>>>> -
>> > > > > > >>>>>>>>>>> startedAt);
>> > > > > > >>>>>>>>>>>
>> > >  openIterators.remove(rocksDbPrefixSeekIterator);
>> > > > > > >>>>>>>>>>>           });
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> The lines with the arrow are the new code. This
>> pattern
>> > > is
>> > > > > > >>>>>> repeated
>> > > > > > >>>>>>>>>>> throughout RocksDBStore, wherever a new
>> RocksDbIterator
>> > > is
>> > > > > > >>>>>> created.
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> Regards,
>> > > > > > >>>>>>>>>>> Nick
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> On Thu, 5 Oct 2023 at 12:32, Colt McNealy <
>> > > c...@littlehorse.io
>> > > > > > >
>> > > > > > >>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> Thank you for the KIP, Nick!
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> This would be highly useful for many reasons. Much
>> > more
>> > > sane
>> > > > > > >>>>>> than
>> > > > > > >>>>>>>>>> checking
>> > > > > > >>>>>>>>>>>> for leaked iterators by profiling memory usage
>> while
>> > > running
>> > > > > > >>>>>>> 100's
>> > > > > > >>>>>>>> of
>> > > > > > >>>>>>>>>>>> thousands of range scans via interactive queries (:
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> One small question:
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> The iterator-duration metrics will be updated
>> > whenever
>> > > an
>> > > > > > >>>>>>>> Iterator's
>> > > > > > >>>>>>>>>>>> close() method is called
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> Does the Iterator have its own "createdAt()" or
>> > > equivalent
>> > > > > > >>>>>> field,
>> > > > > > >>>>>>>> or
>> > > > > > >>>>>>>>>> do we
>> > > > > > >>>>>>>>>>>> need to keep track of the Iterator's start time
>> upon
>> > > creation?
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> Cheers,
>> > > > > > >>>>>>>>>>>> Colt McNealy
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> *Founder, LittleHorse.dev*
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> On Wed, Oct 4, 2023 at 9:07 AM Nick Telford <
>> > > > > > >>>>>>>> nick.telf...@gmail.com>
>> > > > > > >>>>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> Hi everyone,
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> KIP-989 is a small Kafka Streams KIP to add a few
>> new
>> > > metrics
>> > > > > > >>>>>>>> around
>> > > > > > >>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>> creation and use of RocksDB Iterators, to aid
>> users
>> > in
>> > > > > > >>>>>>>> identifying
>> > > > > > >>>>>>>>>>>>> "Iterator leaks" that could cause applications to
>> > leak
>> > > native
>> > > > > > >>>>>>>> memory.
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> Let me know what you think!
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>>
>> > > > > > >>>>
>> > > > > > >>>
>> > > > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> P.S. I'm not too sure about the formatting of the
>> > "New
>> > > > > > >>>>>> Metrics"
>> > > > > > >>>>>>>>>> table,
>> > > > > > >>>>>>>>>>>> any
>> > > > > > >>>>>>>>>>>>> advice there would be appreciated.
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> Regards,
>> > > > > > >>>>>>>>>>>>> Nick
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>>
>> > > > > > >>
>> > > > > > >
>> > > > > >
>> > >
>> >
>>
>

Reply via email to