Actually, one other point: our existing state store operation metrics are measured in nanoseconds[1].
Should iterator-duration-(avg|max) also be measured in nanoseconds, for consistency, or should we keep them milliseconds, as the KIP currently states? 1: https://docs.confluent.io/platform/current/streams/monitoring.html#state-store-metrics On Thu, 16 May 2024 at 12:15, Nick Telford <nick.telf...@gmail.com> wrote: > Good point! I've updated it to "Improved StateStore Iterator metrics for > detecting leaks" - let me know if you have a better suggestion. > > This should affect the voting imo, as nothing of substance has changed. > > Regards, > Nick > > On Thu, 16 May 2024 at 01:39, Sophie Blee-Goldman <sop...@responsive.dev> > wrote: > >> One quick thing -- can you update the title of this KIP to reflect the >> decision to implement these metrics for all state stores implementations >> rather than just RocksDB? >> >> >> On Tue, May 14, 2024 at 1:36 PM Nick Telford <nick.telf...@gmail.com> >> wrote: >> >> > Woops! Thanks for the catch Lucas. Given this was just a typo, I don't >> > think this affects the voting. >> > >> > Cheers, >> > Nick >> > >> > On Tue, 14 May 2024 at 18:06, Lucas Brutschy <lbruts...@confluent.io >> > .invalid> >> > wrote: >> > >> > > Hi Nick, >> > > >> > > you are still referring to oldest-open-iterator-age-ms in the >> > > `Proposed Changes` section. >> > > >> > > Cheers, >> > > Lucas >> > > >> > > On Thu, May 2, 2024 at 4:00 PM Lucas Brutschy <lbruts...@confluent.io >> > >> > > wrote: >> > > > >> > > > Hi Nick! >> > > > >> > > > I agree, the age variant is a bit nicer since the semantics are very >> > > > clear from the name. If you'd rather go for the simple >> implementation, >> > > > how about calling it `oldest-iterator-open-since-ms`? I believe this >> > > > could be understood without docs. Either way, I think we should be >> > > > able to open the vote for this KIP because nobody raised any major / >> > > > blocking concerns. >> > > > >> > > > Looking forward to getting this voted on soon! >> > > > >> > > > Cheers >> > > > Lucas >> > > > >> > > > On Sun, Mar 31, 2024 at 5:23 PM Nick Telford < >> nick.telf...@gmail.com> >> > > wrote: >> > > > > >> > > > > Hi Matthias, >> > > > > >> > > > > > For the oldest iterator metric, I would propose something simple >> > like >> > > > > > `iterator-opened-ms` and it would just be the actual timestamp >> when >> > > the >> > > > > > iterator was opened. I don't think we need to compute the actual >> > age, >> > > > > > but user can to this computation themselves? >> > > > > >> > > > > That works for me; it's easier to implement like that :-D I'm a >> > little >> > > > > concerned that the name "iterator-opened-ms" may not be obvious >> > enough >> > > > > without reading the docs. >> > > > > >> > > > > > If we think reporting the age instead of just the timestamp is >> > > better, I >> > > > > > would propose `iterator-max-age-ms`. I should be sufficient to >> call >> > > out >> > > > > > (as it's kinda "obvious" anyway) that the metric applies to open >> > > > > > iterator only. >> > > > > >> > > > > While I think it's preferable to record the timestamp, rather than >> > the >> > > age, >> > > > > this does have the benefit of a more obvious metric name. >> > > > > >> > > > > > Nit: the KIP says it's a store-level metric, but I think it >> would >> > be >> > > > > > good to say explicitly that it's recorded with DEBUG level only? >> > > > > >> > > > > Yes, I've already updated the KIP with this information in the >> table. >> > > > > >> > > > > Regards, >> > > > > >> > > > > Nick >> > > > > >> > > > > On Sun, 31 Mar 2024 at 10:53, Matthias J. Sax <mj...@apache.org> >> > > wrote: >> > > > > >> > > > > > The time window thing was just an idea. Happy to drop it. >> > > > > > >> > > > > > For the oldest iterator metric, I would propose something simple >> > like >> > > > > > `iterator-opened-ms` and it would just be the actual timestamp >> when >> > > the >> > > > > > iterator was opened. I don't think we need to compute the actual >> > age, >> > > > > > but user can to this computation themselves? >> > > > > > >> > > > > > If we think reporting the age instead of just the timestamp is >> > > better, I >> > > > > > would propose `iterator-max-age-ms`. I should be sufficient to >> call >> > > out >> > > > > > (as it's kinda "obvious" anyway) that the metric applies to open >> > > > > > iterator only. >> > > > > > >> > > > > > And yes, I was hoping that the code inside MetereXxxStore might >> > > already >> > > > > > be setup in a way that custom stores would inherit the iterator >> > > metrics >> > > > > > automatically -- I am just not sure, and left it as an exercise >> for >> > > > > > somebody to confirm :) >> > > > > > >> > > > > > >> > > > > > Nit: the KIP says it's a store-level metric, but I think it >> would >> > be >> > > > > > good to say explicitly that it's recorded with DEBUG level only? >> > > > > > >> > > > > > >> > > > > > >> > > > > > -Matthias >> > > > > > >> > > > > > >> > > > > > On 3/28/24 2:52 PM, Nick Telford wrote: >> > > > > > > Quick addendum: >> > > > > > > >> > > > > > > My suggested metric "oldest-open-iterator-age-seconds" should >> be >> > > > > > > "oldest-open-iterator-age-ms". Milliseconds is obviously a >> better >> > > > > > > granularity for such a metric. >> > > > > > > >> > > > > > > Still accepting suggestions for a better name. >> > > > > > > >> > > > > > > On Thu, 28 Mar 2024 at 13:41, Nick Telford < >> > nick.telf...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > > >> > > > > > >> Hi everyone, >> > > > > > >> >> > > > > > >> Sorry for leaving this for so long. So much for "3 weeks >> until >> > KIP >> > > > > > freeze"! >> > > > > > >> >> > > > > > >> On Sophie's comments: >> > > > > > >> 1. Would Matthias's suggestion of a separate metric tracking >> the >> > > age of >> > > > > > >> the oldest open iterator (within the tag set) satisfy this? >> That >> > > way we >> > > > > > can >> > > > > > >> keep iterator-duration-(avg|max) for closed iterators, which >> can >> > > be >> > > > > > useful >> > > > > > >> for performance debugging for iterators that don't leak. I'm >> not >> > > sure >> > > > > > what >> > > > > > >> we'd call this metric, maybe: >> > "oldest-open-iterator-age-seconds"? >> > > Seems >> > > > > > >> like a mouthful. >> > > > > > >> >> > > > > > >> 2. You're right, it makes more sense to provide >> > > > > > >> iterator-duration-(avg|max). Honestly, I can't remember why I >> > had >> > > > > > "total" >> > > > > > >> before, or why I was computing a rate-of-change over it. >> > > > > > >> >> > > > > > >> 3, 4, 5, 6. Agreed, I'll make all those changes as suggested. >> > > > > > >> >> > > > > > >> 7. Combined with Matthias's point about RocksDB, I'm >> convinced >> > > that this >> > > > > > >> is the wrong KIP for these. I'll introduce the additional >> Rocks >> > > metrics >> > > > > > in >> > > > > > >> another KIP. >> > > > > > >> >> > > > > > >> On Matthias's comments: >> > > > > > >> A. Not sure about the time window. I'm pretty sure all >> existing >> > > avg/max >> > > > > > >> metrics are since the application was started? Any other >> > > suggestions >> > > > > > here >> > > > > > >> would be appreciated. >> > > > > > >> >> > > > > > >> B. Agreed. See point 1 above. >> > > > > > >> >> > > > > > >> C. Good point. My focus was very much on Rocks memory leaks >> when >> > > I wrote >> > > > > > >> the first draft. I can generalise it. My only concern is >> that it >> > > might >> > > > > > make >> > > > > > >> it more difficult to detect Rocks iterator leaks caused >> *within* >> > > our >> > > > > > >> high-level iterator, e.g. RocksJNI, RocksDB, RocksDBStore, >> etc. >> > > But we >> > > > > > >> could always provide a RocksDB-specific metric for this, as >> you >> > > > > > suggested. >> > > > > > >> >> > > > > > >> D. Hmm, we do already have MeteredKeyValueIterator, which >> > > automatically >> > > > > > >> wraps the iterator from inner-stores of >> MeteredKeyValueStore. If >> > > we >> > > > > > >> implemented these metrics there, then custom stores would >> > > automatically >> > > > > > >> gain the functionality, right? This seems like a pretty >> logical >> > > place to >> > > > > > >> implement these metrics, since MeteredKeyValueStore is all >> about >> > > adding >> > > > > > >> metrics to state stores. >> > > > > > >> >> > > > > > >>> I imagine the best way to implement this would be to do so >> at >> > the >> > > > > > >>> high-level iterator rather than implementing it separately >> for >> > > each >> > > > > > >>> specific iterator implementation for every store type. >> > > > > > >> >> > > > > > >> Sophie, does MeteredKeyValueIterator fit with your >> > recommendation? >> > > > > > >> >> > > > > > >> Thanks for your thoughts everyone, I'll update the KIP now. >> > > > > > >> >> > > > > > >> Nick >> > > > > > >> >> > > > > > >> On Thu, 14 Mar 2024 at 03:37, Sophie Blee-Goldman < >> > > > > > sop...@responsive.dev> >> > > > > > >> wrote: >> > > > > > >> >> > > > > > >>> About your last two points: I completely agree that we >> should >> > > try to >> > > > > > >>> make this independent of RocksDB, and should probably adopt >> a >> > > > > > >>> general philosophy of being store-implementation agnostic >> > unless >> > > > > > >>> there is good reason to focus on a particular store type: >> eg if >> > > it was >> > > > > > >>> only possible to implement for certain stores, or only made >> > > sense in >> > > > > > >>> the context of a certain store type but not necessarily >> stores >> > in >> > > > > > general. >> > > > > > >>> >> > > > > > >>> While leaking memory due to unclosed iterators on RocksDB >> > stores >> > > is >> > > > > > >>> certainly the most common issue, I think Matthias >> sufficiently >> > > > > > >>> demonstrated that the problem of leaking iterators is not >> > > actually >> > > > > > >>> unique to RocksDB, and we should consider including >> in-memory >> > > > > > >>> stores at the very least. I also think that at this point, >> we >> > > may as >> > > > > > well >> > > > > > >>> just implement the metrics for *all* store types, whether >> > > rocksdb or >> > > > > > >>> in-memory or custom. Not just because it probably applies to >> > all >> > > > > > >>> store types (leaking iterators are rarely a good thing!) but >> > > because >> > > > > > >>> I imagine the best way to implement this would be to do so >> at >> > the >> > > > > > >>> high-level iterator rather than implementing it separately >> for >> > > each >> > > > > > >>> specific iterator implementation for every store type. >> > > > > > >>> >> > > > > > >>> That said, I haven't thought all that carefully about the >> > > > > > implementation >> > > > > > >>> yet -- it just strikes me as easiest to do at the top level >> of >> > > the >> > > > > > store >> > > > > > >>> hierarchy rather than at the bottom. My gut instinct may >> very >> > > well be >> > > > > > >>> wrong, but that's what it's saying >> > > > > > >>> >> > > > > > >>> On Thu, Mar 7, 2024 at 10:43 AM Matthias J. Sax < >> > > mj...@apache.org> >> > > > > > wrote: >> > > > > > >>> >> > > > > > >>>> Seems I am late to this party. Can we pick this up again >> > aiming >> > > for >> > > > > > 3.8 >> > > > > > >>>> release? I think it would be a great addition. Few >> comments: >> > > > > > >>>> >> > > > > > >>>> >> > > > > > >>>> - I think it does make sense to report >> `iterator-duration-avg` >> > > and >> > > > > > >>>> `iterator-duration-max` for all *closed* iterators -- it >> just >> > > seems to >> > > > > > >>>> be a useful metric (wondering if this would be _overall_ or >> > > bounded to >> > > > > > >>>> some time window?) >> > > > > > >>>> >> > > > > > >>>> - About the duration iterators are currently open, I >> believe >> > > the only >> > > > > > >>>> useful way is to report the "oldest iterator", ie, the >> > smallest >> > > > > > iterator >> > > > > > >>>> open-time, of all currently open-iterator? We all agree >> that >> > in >> > > > > > general, >> > > > > > >>>> leaking iterator would bump the count metric, and if there >> is >> > a >> > > few >> > > > > > ones >> > > > > > >>>> which are not closed and open for a long time, it seem >> > > sufficient to >> > > > > > >>>> detect the single oldest one for alerting purpose? >> > > > > > >>>> >> > > > > > >>>> - What I don't like about the KIP is it focus on RocksDB. I >> > > don't >> > > > > > think >> > > > > > >>>> we should build on the internal RocksDB counters as >> proposed >> > (I >> > > guess, >> > > > > > >>>> we could still expose them, similar to other RocksDB >> metrics >> > > which we >> > > > > > >>>> expose already). However, for this new metric, we should >> track >> > > it >> > > > > > >>>> ourselves and thus make it independent of RocksDB -- in the >> > > end, an >> > > > > > >>>> in-memory store could also leak memory (and kill a JVM >> with an >> > > > > > >>>> out-of-memory error) and we should be able to track it. >> > > > > > >>>> >> > > > > > >>>> - Not sure if we would like to add support for custom >> stores, >> > > to allow >> > > > > > >>>> them to register their iterators with this metric? Or would >> > > this not >> > > > > > be >> > > > > > >>>> necessary, because custom stores could just register a >> custom >> > > metric >> > > > > > >>>> about it to begin with? >> > > > > > >>>> >> > > > > > >>>> >> > > > > > >>>> >> > > > > > >>>> -Matthias >> > > > > > >>>> >> > > > > > >>>> On 10/25/23 4:41 PM, Sophie Blee-Goldman wrote: >> > > > > > >>>>>> >> > > > > > >>>>>> If we used "iterator-duration-max", for >> > > > > > >>>>>> example, would it not be confusing that it includes >> > Iterators >> > > that >> > > > > > >>> are >> > > > > > >>>>>> still open, and therefore the duration is not yet known? >> > > > > > >>>>> >> > > > > > >>>>> >> > > > > > >>>>> 1. Ah, I think I understand your concern better now -- I >> > > totally >> > > > > > agree >> > > > > > >>>> that >> > > > > > >>>>> a >> > > > > > >>>>> "iterator-duration-max" metric would be >> > > confusing/misleading. I >> > > > > > was >> > > > > > >>>>> thinking about it a bit differently, something more akin >> to >> > the >> > > > > > >>>>> "last-rebalance-seconds-ago" consumer metric. As the name >> > > suggests, >> > > > > > >>>>> that basically just tracks how long the consumer has gone >> > > without >> > > > > > >>>>> rebalancing -- it doesn't purport to represent the actual >> > > duration >> > > > > > >>>> between >> > > > > > >>>>> rebalances, just the current time since the last one. The >> > > hard part >> > > > > > >>> is >> > > > > > >>>>> really >> > > > > > >>>>> in choosing a name that reflects this -- maybe you have >> some >> > > better >> > > > > > >>> ideas >> > > > > > >>>>> but off the top of my head, perhaps something like >> > > > > > >>>> "iterator-lifetime-max"? >> > > > > > >>>>> >> > > > > > >>>>> 2. I'm not quite sure how to interpret the >> > > "iterator-duration-total" >> > > > > > >>>> metric >> > > > > > >>>>> -- what exactly does it mean to add up all the iterator >> > > durations? >> > > > > > For >> > > > > > >>>>> some context, while this is not a hard-and-fast rule, in >> > > general >> > > > > > >>> you'll >> > > > > > >>>>> find that Kafka/Streams metrics tend to come in pairs of >> > > avg/max or >> > > > > > >>>>> rate/total. Something that you might measure the avg for >> > > usually is >> > > > > > >>>>> also useful to measure the max, whereas a total metric is >> > > probably >> > > > > > >>>>> also useful as a rate but not so much as an avg. I >> actually >> > > think >> > > > > > this >> > > > > > >>>>> is part of why it feels like it makes so much sense to >> > include >> > > a >> > > > > > "max" >> > > > > > >>>>> version of this metric, as Lucas suggested, even if the >> name >> > of >> > > > > > >>>>> "iterator-duration-max" feels misleading. Ultimately the >> > > metric names >> > > > > > >>>>> are up to you, but for this reason, I would personally >> > > advocate for >> > > > > > >>>>> just going with an "iterator-duration-avg" and >> > > > > > "iterator-duration-max" >> > > > > > >>>>> >> > > > > > >>>>> I did see your example in which you mention one could >> monitor >> > > the >> > > > > > >>>>> rate of change of the "-total" metric. While this does >> make >> > > sense to >> > > > > > >>>>> me, if the only way to interpret a metric is by computing >> > > another >> > > > > > >>>>> function over it, then why not just make that computation >> the >> > > metric >> > > > > > >>>>> and cut out the middle man? And in this case, to me at >> least, >> > > it >> > > > > > feels >> > > > > > >>>>> much easier to understand a metric like >> > > "iterator-duration-max" vs >> > > > > > >>>>> something like "iterator-duration-total-rate" >> > > > > > >>>>> >> > > > > > >>>>> 3. By the way, can you add another column to the table >> with >> > > the new >> > > > > > >>>> metrics >> > > > > > >>>>> that lists the recording level? My suggestion would be to >> put >> > > the >> > > > > > >>>>> "number-open-iterators" at INFO and the other two at >> DEBUG. >> > See >> > > > > > >>>>> the following for my reasoning behind this recommendation >> > > > > > >>>>> >> > > > > > >>>>> 4. I would change the "Type" entry for the >> > > "number-open-iterators" >> > > > > > >>> from >> > > > > > >>>>> "Value" to "Gauge". This helps justify the "INFO" level >> for >> > > this >> > > > > > >>> metric, >> > > > > > >>>>> since unlike the other metrics which are "Measurables", >> the >> > > current >> > > > > > >>>>> timestamp won't need to be retrieved on each recording >> > > > > > >>>>> >> > > > > > >>>>> 5. Can you list the tags that would be associated with >> each >> > of >> > > these >> > > > > > >>>>> metrics (either in the table, or separately above/below if >> > > they will >> > > > > > >>> be >> > > > > > >>>>> the same for all) >> > > > > > >>>>> >> > > > > > >>>>> 6. Do you have a strong preference for the name >> > > > > > >>> "number-open-iterators" >> > > > > > >>>>> or would you be alright in shortening this to >> > > "num-open-iterators"? >> > > > > > >>> The >> > > > > > >>>>> latter is more in line with the naming scheme used >> elsewhere >> > > in Kafka >> > > > > > >>>>> for similar kinds of metrics, and a shorter name is always >> > > nice. >> > > > > > >>>>> >> > > > > > >>>>> 7. With respect to the rocksdb cache metrics, those sound >> > > useful but >> > > > > > >>>>> if it was me, I would probably save them for a separate >> KIP >> > > mainly >> > > > > > >>> just >> > > > > > >>>>> because the KIP freeze deadline is in a few weeks, and I >> > > wouldn't >> > > > > > want >> > > > > > >>>>> to end up blocking all the new metrics just because there >> was >> > > ongoing >> > > > > > >>>>> debate about a subset of them. That said, you do have 3 >> full >> > > weeks, >> > > > > > so >> > > > > > >>>>> I would hope that you could get both sets of metrics >> agreed >> > > upon in >> > > > > > >>>>> that timeframe! >> > > > > > >>>>> >> > > > > > >>>>> >> > > > > > >>>>> On Tue, Oct 24, 2023 at 6:35 AM Nick Telford < >> > > nick.telf...@gmail.com >> > > > > > > >> > > > > > >>>> wrote: >> > > > > > >>>>> >> > > > > > >>>>>> I don't really have a problem with adding such a metric, >> I'm >> > > just >> > > > > > not >> > > > > > >>>>>> entirely sure how it would work. If we used >> > > "iterator-duration-max", >> > > > > > >>> for >> > > > > > >>>>>> example, would it not be confusing that it includes >> > Iterators >> > > that >> > > > > > >>> are >> > > > > > >>>>>> still open, and therefore the duration is not yet known? >> > When >> > > > > > >>> graphing >> > > > > > >>>> that >> > > > > > >>>>>> over time, I suspect it would be difficult to understand. >> > > > > > >>>>>> >> > > > > > >>>>>> 3. >> > > > > > >>>>>> FWIW, this would still be picked up by "open-iterators", >> > > since that >> > > > > > >>>> metric >> > > > > > >>>>>> is only decremented when Iterator#close is called (via >> the >> > > > > > >>>>>> ManagedKeyValueIterator#onClose hook). >> > > > > > >>>>>> >> > > > > > >>>>>> I'm actually considering expanding the scope of this KIP >> > > slightly to >> > > > > > >>>>>> include improved Block Cache metrics, as my own memory >> leak >> > > > > > >>>> investigations >> > > > > > >>>>>> have trended in that direction. Do you think the >> following >> > > metrics >> > > > > > >>>> should >> > > > > > >>>>>> be included in this KIP, or should I create a new KIP? >> > > > > > >>>>>> >> > > > > > >>>>>> - block-cache-index-usage (number of bytes occupied >> by >> > > index >> > > > > > >>> blocks) >> > > > > > >>>>>> - block-cache-filter-usage (number of bytes >> occupied by >> > > filter >> > > > > > >>>> blocks) >> > > > > > >>>>>> >> > > > > > >>>>>> Regards, >> > > > > > >>>>>> Nick >> > > > > > >>>>>> >> > > > > > >>>>>> On Tue, 24 Oct 2023 at 07:09, Sophie Blee-Goldman < >> > > > > > >>>> sop...@responsive.dev> >> > > > > > >>>>>> wrote: >> > > > > > >>>>>> >> > > > > > >>>>>>> I actually think we could implement Lucas' suggestion >> > pretty >> > > easily >> > > > > > >>> and >> > > > > > >>>>>>> without too much additional effort. We have full control >> > > over the >> > > > > > >>>>>> iterator >> > > > > > >>>>>>> that is returned by the various range queries, so it >> would >> > > be easy >> > > > > > >>> to >> > > > > > >>>>>>> register a gauge metric for how long it has been since >> the >> > > iterator >> > > > > > >>> was >> > > > > > >>>>>>> created. Then we just deregister the metric when the >> > > iterator is >> > > > > > >>>> closed. >> > > > > > >>>>>>> >> > > > > > >>>>>>> With respect to how useful this metric would be, both >> Nick >> > > and >> > > > > > Lucas >> > > > > > >>>> have >> > > > > > >>>>>>> made good points: I would agree that in general, leaking >> > > iterators >> > > > > > >>>> would >> > > > > > >>>>>>> mean an ever-increasing iterator count that should be >> > > possible to >> > > > > > >>> spot >> > > > > > >>>>>>> without this. However, a few things to consider: >> > > > > > >>>>>>> >> > > > > > >>>>>>> 1. it's really easy to set up an alert based on some >> > maximum >> > > > > > >>> threshold >> > > > > > >>>> of >> > > > > > >>>>>>> how long an iterator should remain open for. It's >> > relatively >> > > more >> > > > > > >>>> tricky >> > > > > > >>>>>> to >> > > > > > >>>>>>> set up alerts based on the current count of open >> iterators >> > > and how >> > > > > > >>> it >> > > > > > >>>>>>> changes over time. >> > > > > > >>>>>>> 2. As Lucas mentioned, it only takes a few iterators to >> > > wreak havoc >> > > > > > >>> in >> > > > > > >>>>>>> extreme cases. Sometimes more advanced applications end >> up >> > > with >> > > > > > >>> just a >> > > > > > >>>>>> few >> > > > > > >>>>>>> leaking iterators despite closing the majority of them. >> > I've >> > > seen >> > > > > > >>> this >> > > > > > >>>>>>> happen just once personally, but it was driving everyone >> > > crazy >> > > > > > >>> until we >> > > > > > >>>>>>> figured it out. >> > > > > > >>>>>>> 3. A metric for how long the iterator has been open >> would >> > > help to >> > > > > > >>>>>> identify >> > > > > > >>>>>>> hanging iterators due to some logic where the iterator >> is >> > > properly >> > > > > > >>>> closed >> > > > > > >>>>>>> but for whatever reason just isn't being advanced to the >> > > end, and >> > > > > > >>> thus >> > > > > > >>>>>> not >> > > > > > >>>>>>> reached the iterator#close line of the user code. This >> case >> > > seems >> > > > > > >>>>>> difficult >> > > > > > >>>>>>> to spot without the specific metric for iterator >> lifetime >> > > > > > >>>>>>> 4. Lastly, I think you could argue that all of the above >> > are >> > > fairly >> > > > > > >>>>>>> advanced use cases, but this seems like a fairly >> advanced >> > > feature >> > > > > > >>>>>> already, >> > > > > > >>>>>>> so it doesn't seem unreasonable to try and cover all the >> > > bases. >> > > > > > >>>>>>> >> > > > > > >>>>>>> All that said, my philosophy is that the KIP author gets >> > the >> > > final >> > > > > > >>> word >> > > > > > >>>>>> on >> > > > > > >>>>>>> what to pull into scope as long as the proposal isn't >> > harming >> > > > > > anyone >> > > > > > >>>>>>> without the extra feature/changes. So it's up to you >> Nick >> > > -- just >> > > > > > >>>> wanted >> > > > > > >>>>>>> to add some context on how it could work, and why it >> would >> > be >> > > > > > >>> helpful >> > > > > > >>>>>>> >> > > > > > >>>>>>> Thanks for the KIP! >> > > > > > >>>>>>> >> > > > > > >>>>>>> On Wed, Oct 18, 2023 at 7:04 AM Lucas Brutschy >> > > > > > >>>>>>> <lbruts...@confluent.io.invalid> wrote: >> > > > > > >>>>>>> >> > > > > > >>>>>>>> Hi Nick, >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> I did not think in detail about how to implement it, >> just >> > > about >> > > > > > >>> what >> > > > > > >>>>>>>> metrics would be nice to have. You are right, we'd >> have to >> > > > > > >>>>>>>> register/deregister the iterators during open/close. >> This >> > > would be >> > > > > > >>>>>>>> more complicated to implement than the other metrics, >> but >> > I >> > > do not >> > > > > > >>> see >> > > > > > >>>>>>>> a fundamental problem with it. >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> As far as I understand, even a low number of leaked >> > > iterators can >> > > > > > >>> hurt >> > > > > > >>>>>>>> RocksDB compaction significantly. So we may even want >> to >> > > detect if >> > > > > > >>> the >> > > > > > >>>>>>>> iterators are opened by one-time / rare queries against >> > the >> > > state >> > > > > > >>>>>>>> store. >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> But, as I said, that would be an addition and not a >> change >> > > of the >> > > > > > >>>>>>>> current contents of the KIP, so I'd support the KIP >> moving >> > > forward >> > > > > > >>>>>>>> even without this extension. >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> Cheers, Lucas >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> On Tue, Oct 17, 2023 at 3:45 PM Nick Telford < >> > > > > > >>> nick.telf...@gmail.com> >> > > > > > >>>>>>>> wrote: >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> Hi Lucas, >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> Hmm, I'm not sure how we could reliably identify such >> > > leaked >> > > > > > >>>>>> Iterators. >> > > > > > >>>>>>>> If >> > > > > > >>>>>>>>> we tried to include open iterators when calculating >> > > > > > >>>>>> iterator-duration, >> > > > > > >>>>>>>> we'd >> > > > > > >>>>>>>>> need some kind of registry of all the open iterator >> > > creation >> > > > > > >>>>>>> timestamps, >> > > > > > >>>>>>>>> wouldn't we? >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> In general, if you have a leaky Iterator, it should >> > > manifest as a >> > > > > > >>>>>>>>> persistently climbing "open-iterators" metric, even >> on a >> > > busy >> > > > > > >>> node, >> > > > > > >>>>>>>> because >> > > > > > >>>>>>>>> each time that Iterator is used, it will leak another >> > one. >> > > So >> > > > > > >>> even in >> > > > > > >>>>>>> the >> > > > > > >>>>>>>>> presence of many non-leaky Iterators on a busy >> instance, >> > > the >> > > > > > >>> metric >> > > > > > >>>>>>>> should >> > > > > > >>>>>>>>> still consistently climb. >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> Regards, >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> Nick >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> On Mon, 16 Oct 2023 at 14:24, Lucas Brutschy < >> > > > > > >>> lbruts...@confluent.io >> > > > > > >>>>>>>> .invalid> >> > > > > > >>>>>>>>> wrote: >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>>> Hi Nick! >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> thanks for the KIP! I think this could be quite >> useful, >> > > given >> > > > > > the >> > > > > > >>>>>>>>>> problems that we had with leaks due to RocksDB >> resources >> > > not >> > > > > > >>> being >> > > > > > >>>>>>>>>> closed. >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> I don't have any pressing issues why we can't accept >> it >> > > like it >> > > > > > >>> is, >> > > > > > >>>>>>>>>> just one minor point for discussion: would it also >> make >> > > sense to >> > > > > > >>>>>> make >> > > > > > >>>>>>>>>> it possible to identify a few very long-running / >> leaked >> > > > > > >>>>>> iterators? I >> > > > > > >>>>>>>>>> can imagine on a busy node, it would be hard to spot >> > that >> > > 1% of >> > > > > > >>> the >> > > > > > >>>>>>>>>> iterators never close when looking only at closed >> > > iterator or >> > > > > > the >> > > > > > >>>>>>>>>> number of iterators. But it could still be good to >> > > identify >> > > > > > those >> > > > > > >>>>>>>>>> leaks early. One option would be to add >> > > `iterator-duration-max` >> > > > > > >>> and >> > > > > > >>>>>>>>>> take open iterators into account when computing the >> > > metric. >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> Cheers, >> > > > > > >>>>>>>>>> Lucas >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> On Thu, Oct 5, 2023 at 3:50 PM Nick Telford < >> > > > > > >>>>>> nick.telf...@gmail.com> >> > > > > > >>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> Hi Colt, >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> I kept the details out of the KIP, because KIPs >> > generally >> > > > > > >>>>>> document >> > > > > > >>>>>>>>>>> high-level design, but the way I'm doing it is like >> > this: >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> final ManagedKeyValueIterator<Bytes, >> byte[]> >> > > > > > >>>>>>>>>>> rocksDbPrefixSeekIterator = cf.prefixScan(accessor, >> > > > > > >>> prefixBytes); >> > > > > > >>>>>>>>>>> --> final long startedAt = System.nanoTime(); >> > > > > > >>>>>>>>>>> >> openIterators.add(rocksDbPrefixSeekIterator); >> > > > > > >>>>>>>>>>> rocksDbPrefixSeekIterator.onClose(() -> { >> > > > > > >>>>>>>>>>> --> >> > > > > > >>>>>>> >> metricsRecorder.recordIteratorDuration(System.nanoTime() >> > > > > > >>>>>>>> - >> > > > > > >>>>>>>>>>> startedAt); >> > > > > > >>>>>>>>>>> >> > > openIterators.remove(rocksDbPrefixSeekIterator); >> > > > > > >>>>>>>>>>> }); >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> The lines with the arrow are the new code. This >> pattern >> > > is >> > > > > > >>>>>> repeated >> > > > > > >>>>>>>>>>> throughout RocksDBStore, wherever a new >> RocksDbIterator >> > > is >> > > > > > >>>>>> created. >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> Regards, >> > > > > > >>>>>>>>>>> Nick >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> On Thu, 5 Oct 2023 at 12:32, Colt McNealy < >> > > c...@littlehorse.io >> > > > > > > >> > > > > > >>>>>>>> wrote: >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>>> Thank you for the KIP, Nick! >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> This would be highly useful for many reasons. Much >> > more >> > > sane >> > > > > > >>>>>> than >> > > > > > >>>>>>>>>> checking >> > > > > > >>>>>>>>>>>> for leaked iterators by profiling memory usage >> while >> > > running >> > > > > > >>>>>>> 100's >> > > > > > >>>>>>>> of >> > > > > > >>>>>>>>>>>> thousands of range scans via interactive queries (: >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> One small question: >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> The iterator-duration metrics will be updated >> > whenever >> > > an >> > > > > > >>>>>>>> Iterator's >> > > > > > >>>>>>>>>>>> close() method is called >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> Does the Iterator have its own "createdAt()" or >> > > equivalent >> > > > > > >>>>>> field, >> > > > > > >>>>>>>> or >> > > > > > >>>>>>>>>> do we >> > > > > > >>>>>>>>>>>> need to keep track of the Iterator's start time >> upon >> > > creation? >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> Cheers, >> > > > > > >>>>>>>>>>>> Colt McNealy >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> *Founder, LittleHorse.dev* >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> On Wed, Oct 4, 2023 at 9:07 AM Nick Telford < >> > > > > > >>>>>>>> nick.telf...@gmail.com> >> > > > > > >>>>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> Hi everyone, >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> KIP-989 is a small Kafka Streams KIP to add a few >> new >> > > metrics >> > > > > > >>>>>>>> around >> > > > > > >>>>>>>>>> the >> > > > > > >>>>>>>>>>>>> creation and use of RocksDB Iterators, to aid >> users >> > in >> > > > > > >>>>>>>> identifying >> > > > > > >>>>>>>>>>>>> "Iterator leaks" that could cause applications to >> > leak >> > > native >> > > > > > >>>>>>>> memory. >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> Let me know what you think! >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>>> >> > > > > > >>>> >> > > > > > >>> >> > > > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> P.S. I'm not too sure about the formatting of the >> > "New >> > > > > > >>>>>> Metrics" >> > > > > > >>>>>>>>>> table, >> > > > > > >>>>>>>>>>>> any >> > > > > > >>>>>>>>>>>>> advice there would be appreciated. >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> Regards, >> > > > > > >>>>>>>>>>>>> Nick >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >>> >> > > > > > >> >> > > > > > > >> > > > > > >> > > >> > >> >