Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

Bruno Cadonna Tue, 04 Jun 2019 14:10:02 -0700

Hi Guozhang,

After some thoughts, I tend to be in favour of the option with metrics
for each physical RocksDB instance for the following reasons:


1) A user already needs to be aware of segmented state stores when
providing a custom RocksDBConfigSetter. In RocksDBConfigSetter one can
specify settings for a store depending on the name of the store. Since
segments (i.e. state store) in a segmented state store have names that
share a prefix but have suffixes that are created at runtime, increase
with time and are theoretically unbounded, a user needs to take
account of the segments to provide the settings for all (i.e. matching
the common prefix) or some (i.e. matching the common prefix and for
example suffixes according to a specific pattern) of the segments of a
specific segmented state store.
2) Currently settings for RocksDB can only be specified by a user per
physical instance and not per logical instance. Deriving good settings
for physical instances from metrics for a logical instance can be hard
if the physical instances are not accessed uniformly. In segmented
state stores segments are not accessed uniformly.
3) Simpler to implement and to get things done.

Any thoughts on this from anybody?

Best,
Bruno

On Thu, May 30, 2019 at 8:33 PM Guozhang Wang <[email protected]> wrote:
>
> Hi Bruno:
>
> Regarding 2) I think either way has some shortcomings: exposing the metrics
> per rocksDB instance for window / session stores exposed some
> implementation internals (that we use segmented stores) to enforce users to
> be aware of them. E.g. what if we want to silently change the internal
> implementation by walking away from the segmented approach? On the other
> hand, coalescing multiple rocksDB instances' metrics into a single one per
> each logical store also has some concerns as I mentioned above. What I'm
> thinking is actually that, if we can customize the aggregation logic to
> still has one set of metrics per each logical store which may be composed
> of multiple rocksDB ones -- e.g. for `bytes-written-rate` we sum them
> across rocksDBs, while for `memtable-hit-rate` we do weighted average?
>
> Regarding logging levels, I think have DEBUG is fine, but also that means
> without manually turning it on users would not get it. Maybe we should
> consider adding some dynamic setting mechanisms in the future to allow
> users turn it on / off during run-time.
>
>
> Guozhang
>
>
>
> On Tue, May 28, 2019 at 6:23 AM Bruno Cadonna <[email protected]> wrote:
>
> > Hi,
> >
> > Thank you for your comments.
> >
> > @Bill:
> >
> > 1. It is like Guozhang wrote:
> > - rocksdb-state-id is for key-value stores
> > - rocksdb-session-state-id is for session stores
> > - rocksdb-window-state-id is for window stores
> > These tags are defined in the corresponding store builders and I think
> > it is a good idea to re-use them.
> >
> > 2. I could not find any exposed ticker or histogram to get the total
> > and average number of compactions, although RocksDB dumps the number
> > of compactions between levels in its log files. There is the
> > NUM_SUBCOMPACTIONS_SCHEDULED histogram that gives you statistics about
> > the number of subcompactions actually scheduled during a compaction,
> > but that is not that what you are looking for. If they will expose the
> > number of compaction in the future, we can still add the metrics you
> > propose. I guess, the metric in this KIP that would indirectly be
> > influenced by the number of L0 files would be write-stall-duration. If
> > there are too many compactions this duration should increase. However,
> > this metric is also influenced by memtable flushes.
> >
> > @John:
> >
> > I think it is a good idea to prefix the flush-related metrics with
> > memtable to avoid ambiguity. I changed the KIP accordingly.
> >
> > @Dongjin:
> >
> > For the tag and compaction-related comments, please see my answers to Bill.
> >
> > I cannot follow your second paragraph. Are you saying that a tuning
> > guide for RocksDB within Streams based on the metrics in this KIP is
> > out of scope? I also think that it doesn't need to be included in this
> > KIP, but it is worth to work on it afterwards.
> >
> > @Guozhang:
> >
> > 1. Thank you for the explanation. I missed that. I modified the KIP
> > accordingly.
> >
> > 2. No, my plan is to record and expose the set of metrics for each
> > RocksDB store separately. Each set of metrics can then be
> > distinguished by its store ID. Do I miss something here?
> >
> > 3. I agree with you and Sophie about user education and that we should
> > work on it after this KIP.
> >
> > 4. I agree also on the user API. However, I would like to open a
> > separate KIP for it because I still need a bit of thinking to get it.
> > I also think it is a good idea to do one step after the other to get
> > at least the built-in RocksDB metrics into the next release.
> > Do you think I chose too many metrics as built-in metrics for this
> > KIP? What do others think?
> >
> > @Sophie:
> >
> > I tend to DEBUG level, but I do not have heart feelings about it. Do
> > you mean to turn it on/off RocksDB metrics in the Streams
> > configuration?
> >
> > Best,
> > Bruno
> >
> > On Tue, May 21, 2019 at 8:02 PM Sophie Blee-Goldman <[email protected]>
> > wrote:
> > >
> > > I definitely agree with Guozhang's "meta" comment: if it's possible to
> > > allow users to pick and choose individual RocksDB metrics that would be
> > > ideal. One further question is whether these will be debug or info level
> > > metrics, or a separate level altogether? If there is a nontrivial
> > overhead
> > > associated with attaching RocksDB metrics it would probably be good to be
> > > able to independently turn on/off Rocks metrics
> > >
> > > On Tue, May 21, 2019 at 9:00 AM Guozhang Wang <[email protected]>
> > wrote:
> > >
> > > > Hello Bruno,
> > > >
> > > > Thanks for the KIP, I have a few minor comments and a meta one which
> > are
> > > > relatively aligned with other folks:
> > > >
> > > > Minor:
> > > >
> > > > 1) Regarding the "rocksdb-state-id = [store ID]", to be consistent with
> > > > other state store metrics (see
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams
> > ),
> > > > this tag should be either "rocksdb-window-state-id",
> > > > "rocksdb-session-state-id" or "rocksdb-state-id". For window / session
> > > > store backed by rocksDB, the tags should not be "rocksdb-state-id".
> > > >
> > > > 2) Also for window / session store, my take is that you plan to have
> > > > multiple rocksDB behind the scene to report to the same set of
> > metrics, is
> > > > that right? My concern is that for such types of state stores, most of
> > the
> > > > access would be on the first segment rocksDB instance, and hence
> > coalescing
> > > > all of instances as a single set of metrics may dilute it.
> > > >
> > > > 3) I agree with @[email protected] <[email protected]> that we
> > should
> > > > better have some documentation educating users what to do when see what
> > > > anomalies in metrics; though I think this is a long endeavoring effort
> > that
> > > > may go beyond this KIP's scope, let's keep that as a separate umbrella
> > JIRA
> > > > to accumulate knowledge over time.
> > > >
> > > >
> > > > Meta:
> > > >
> > > > 4) Instead of trying to enumerate all the ones that might be helpful,
> > I'd
> > > > recommend that we expose a user-friendly API in StreamsMetrics to allow
> > > > users to add more metrics from RocksDB they'd like to have, while only
> > > > keeping a small set of most-meaningful ones that are ubiquitously
> > useful
> > > > out-of-the-box. WDYT?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > >
> > > > On Tue, May 21, 2019 at 8:04 AM Dongjin Lee <[email protected]>
> > wrote:
> > > >
> > > >> Hi Bruno,
> > > >>
> > > >> I just read the KIP. I think this feature is great. As far as I know,
> > most
> > > >> Kafka users monitor the host resources, JVM resources, and Kafka
> > metrics
> > > >> only, not RocksDB for configuring the statistics feature is a little
> > bit
> > > >> tiresome. Since RocksDB impacts the performance of Kafka Streams, I
> > > >> believe
> > > >> providing these metrics with other metrics in one place is much
> > better.
> > > >>
> > > >> However, I am a little bit not assured for how much information
> > should be
> > > >> provided to the users with the documentation - how the user can
> > control
> > > >> the
> > > >>  RocksDB may on the boundary of the scope. How do you think?
> > > >>
> > > >> +1. I entirely agree to Bill's comments; I also think
> > `rocksdb-store-id`
> > > >> is
> > > >> better than `rocksdb-state-id` and metrics on total compactions and an
> > > >> average number of compactions is also needed.
> > > >>
> > > >> Regards,
> > > >> Dongjin
> > > >>
> > > >> On Tue, May 21, 2019 at 2:48 AM John Roesler <[email protected]>
> > wrote:
> > > >>
> > > >> > Hi Bruno,
> > > >> >
> > > >> > Looks really good overall. This is going to be an awesome addition.
> > > >> >
> > > >> > My only thought was that we have "bytes-flushed-(rate|total) and
> > > >> > flush-time-(avg|min|max)" metrics, and the description states that
> > > >> > these are specifically for Memtable flush operations. What do you
> > > >> > think about calling it "memtable-bytes-flushed... (etc)"? On one
> > hand,
> > > >> > I could see this being redundant, since that's the only thing that
> > > >> > gets "flushed" inside of Rocks. But on the other, we have an
> > > >> > independent "flush" operation in streams, which might be confusing.
> > > >> > Plus, it might help people who are looking at the full "menu" of
> > > >> > hundreds of metrics. They can't read and remember every description
> > > >> > while trying to understand the full list of metrics, so going for
> > > >> > maximum self-documentation in the name seems nice.
> > > >> >
> > > >> > But that's a minor thought. Modulo the others' comments, this looks
> > good
> > > >> > to me.
> > > >> >
> > > >> > Thanks,
> > > >> > -John
> > > >> >
> > > >> > On Mon, May 20, 2019 at 11:07 AM Bill Bejeck <[email protected]>
> > wrote:
> > > >> > >
> > > >> > > Hi Bruno,
> > > >> > >
> > > >> > > Thanks for the KIP, this will be a useful addition.
> > > >> > >
> > > >> > > Overall the KIP looks good to me, and I have two minor comments.
> > > >> > >
> > > >> > > 1. For the tags should, I'm wondering if rocksdb-state-id should
> > be
> > > >> > > rocksdb-store-id
> > > >> > > instead?
> > > >> > >
> > > >> > > 2. With the compaction metrics, would it be possible to add total
> > > >> > > compactions and an average number of compactions?  I've taken a
> > look
> > > >> at
> > > >> > the
> > > >> > > available RocksDB metrics, and I'm not sure.  But users can
> > control
> > > >> how
> > > >> > > many L0 files it takes to trigger compaction so if it is
> > possible; it
> > > >> may
> > > >> > > be useful.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Bill
> > > >> > >
> > > >> > >
> > > >> > > On Mon, May 20, 2019 at 9:15 AM Bruno Cadonna <[email protected]
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hi Sophie,
> > > >> > > >
> > > >> > > > Thank you for your comments.
> > > >> > > >
> > > >> > > > It's a good idea to supplement the metrics with configuration
> > option
> > > >> > > > to change the metrics. I also had some thoughts about it.
> > However, I
> > > >> > > > think I need some experimentation to get this right.
> > > >> > > >
> > > >> > > > I added the block cache hit rates for index and filter blocks
> > to the
> > > >> > > > KIP. As far as I understood, they should stay at zero, if users
> > do
> > > >> not
> > > >> > > > configure RocksDB to include index and filter blocks into the
> > block
> > > >> > > > cache. Did you also understand this similarly? I guess also in
> > this
> > > >> > > > case some experimentation would be good to be sure.
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Bruno
> > > >> > > >
> > > >> > > >
> > > >> > > > On Sat, May 18, 2019 at 2:29 AM Sophie Blee-Goldman <
> > > >> > [email protected]>
> > > >> > > > wrote:
> > > >> > > > >
> > > >> > > > > Actually I wonder if it might be useful to users to be able to
> > > >> break
> > > >> > up
> > > >> > > > the
> > > >> > > > > cache hit stats by type? Some people may choose to store
> > index and
> > > >> > filter
> > > >> > > > > blocks alongside data blocks, and it would probably be very
> > > >> helpful
> > > >> > for
> > > >> > > > > them to know who is making more effective use of the cache in
> > > >> order
> > > >> > to
> > > >> > > > tune
> > > >> > > > > how much of it is allocated to each. I'm not sure how common
> > this
> > > >> > really
> > > >> > > > is
> > > >> > > > > but I think it would be invaluable to those who do. RocksDB
> > > >> > performance
> > > >> > > > can
> > > >> > > > > be quite opaque..
> > > >> > > > >
> > > >> > > > > Cheers,
> > > >> > > > > Sophie
> > > >> > > > >
> > > >> > > > > On Fri, May 17, 2019 at 5:01 PM Sophie Blee-Goldman <
> > > >> > [email protected]
> > > >> > > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hey Bruno!
> > > >> > > > > >
> > > >> > > > > > This all looks pretty good to me, but one suggestion I have
> > is
> > > >> to
> > > >> > > > > > supplement each of the metrics with some info on how the
> > user
> > > >> can
> > > >> > > > control
> > > >> > > > > > them. In other words, which options could/should they set in
> > > >> > > > > > RocksDBConfigSetter should they discover a particular
> > > >> bottleneck?
> > > >> > > > > >
> > > >> > > > > > I don't think this necessarily needs to go into the KIP,
> > but I
> > > >> do
> > > >> > > > think it
> > > >> > > > > > should be included in the docs somewhere (happy to help
> > build up
> > > >> > the
> > > >> > > > list
> > > >> > > > > > of associated options when the time comes)
> > > >> > > > > >
> > > >> > > > > > Thanks!
> > > >> > > > > > Sophie
> > > >> > > > > >
> > > >> > > > > > On Fri, May 17, 2019 at 2:54 PM Bruno Cadonna <
> > > >> [email protected]>
> > > >> > > > wrote:
> > > >> > > > > >
> > > >> > > > > >> Hi all,
> > > >> > > > > >>
> > > >> > > > > >> this KIP describes the extension of the Kafka Streams'
> > metrics
> > > >> to
> > > >> > > > include
> > > >> > > > > >> RocksDB's internal statistics.
> > > >> > > > > >>
> > > >> > > > > >> Please have a look at it and let me know what you think.
> > Since
> > > >> I
> > > >> > am
> > > >> > > > not a
> > > >> > > > > >> RocksDB expert, I am thankful for any additional pair of
> > eyes
> > > >> that
> > > >> > > > > >> evaluates this KIP.
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > >
> > > >> >
> > > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-471:+Expose+RocksDB+Metrics+in+Kafka+Streams
> > > >> > > > > >>
> > > >> > > > > >> Best regards,
> > > >> > > > > >> Bruno
> > > >> > > > > >>
> > > >> > > > > >
> > > >> > > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> *Dongjin Lee*
> > > >>
> > > >> *A hitchhiker in the mathematical world.*
> > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> <https://github.com/dongjinleekr>linkedin:
> > > >> kr.linkedin.com/in/dongjinleekr
> > > >> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> speakerdeck.com/dongjin
> > > >> <https://speakerdeck.com/dongjin>*
> > > >>
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> >
>
>
> --
> -- Guozhang

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

Reply via email to