Hi Jing, > 1. What is the relationship between Metrics and ***Metrics?
Yes, you're right. Metrics is used to store the reporters and metric groups, and is responsible for reporting metrics to the external backend periodically. ***Metrics is a grouping of corresponding metric set, we'll define and init the metrics in it. > 2. There is no group hierarchy, i.e. sub-groups. Will it be supported later or will we stick to the flat group design in the future? In my opinion, the flat metric group is enough for Paimon. I don't see the need for current designed Paimon metrics, like CommitMetrics / ScanMetrics / CompactionMetrics are flat, which is not complicated as Flink metrics system, so we simplified the design of MetricGroup. How do you think? Best, Guojun On Mon, Jun 26, 2023 at 11:24 PM jing ge <[email protected]> wrote: > Hi Guojun, > > Sorry for the late reply and thanks for the proposal. The PIP looks good to > me. I just have two questions to make sure we are on the same page. > > 1. What is the relationship between Metrics and ***Metrics? It seems that > e.g. CommitMetrics is only responsible for built-in metrics write/init > while Metrics is used for read and update access centrally for all > metricGroups and metrics, i.e. getter methods. Did I understand correctly? > 2. Current group design is flat. There is no group hierarchy, i.e. > sub-groups. Will it be supported later or will we stick to the flat group > design in the future? > > Best Regards > Jing > > > On Sun, Jun 25, 2023 at 10:33 AM Guojun Li <[email protected]> > wrote: > > > Hi, devs, > > > > Thank you for your feedback on PIP-3[1], I'd like to start a vote on it > > since the discussion on this thread[2] came to an agreement. > > > > The vote will last for at least 72 hours (Jun 29th, 2:00 GMT, excluding > > weekend days) unless there is an objection or insufficient vote. > > > > Best, > > Guojun > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon > > [2]https://lists.apache.org/thread/h2lgqgrvcptzj3c8q4cjzv5jopgtwx9o > > > > On Wed, Jun 21, 2023 at 5:53 PM Caizhi Weng <[email protected]> > wrote: > > > > > Hi Guojun! > > > > > > I've checked the latest PIP-3 and it now looks fine to me overall. > > Although > > > there are still some minor issues, for example it is unclear to me how > > > MetricGroup is created and is tagged. But this detail can be further > > > discussed during the implementation. From my point of view this PIP is > > now > > > fine enough to go on a vote. > > > > > > Guojun Li <[email protected]> 于2023年6月21日周三 15:23写道: > > > > > > > Hi caizhi, > > > > > > > > I've updated APIs in PIP3: > > > > 1. Instance of Metrics class is a singleton which maintains the > > reporters > > > > and Metric groups. > > > > 2. Instantiation of MetricGroup will add the current group to Metrics > > > > instance. > > > > 3. The metrics list instance, like CommitMetrics instance, will > > maintain > > > > the corresponding metricGroup, and can register metrics by the > gauge(), > > > > counter() ... method. > > > > > > > > Best, > > > > Guojun > > > > > > > > On Mon, Jun 19, 2023 at 11:30 AM Caizhi Weng <[email protected]> > > > wrote: > > > > > > > > > Hi! > > > > > > > > > > 1. Could you add some demonstration code about how to create and > use > > a > > > > > metric? You CommitMetrics code example uses Metrics#gauge method, > but > > > > > Metrics class does not have such method. > > > > > > > > > > 2. You've added Metrics#registerMetrics(String name, Metric metric, > > int > > > > > groupIndex), which seems really weird to me. When registering a > > metric > > > we > > > > > have to remember the index of the group? Why not providing the > group > > > > > directly, or just register the metric in the group? > > > > > > > > > > > > > > > Caizhi Weng <[email protected]> 于2023年6月16日周五 16:01写道: > > > > > > > > > > > Hi Guojun! > > > > > > > > > > > > I see that you've submitted the 3rd edition of PIP-3. Most > problems > > > > have > > > > > > been resolved, however I still have these doubts: > > > > > > > > > > > > 1. Is `Metrics` class a singleton? If yes, it should contain > > multiple > > > > > > `MetricGroup` as its member. Currently each `Metrics` class has > > only > > > > one > > > > > > `MetricGroup`. If it is not a singleton, how many instances will > > > there > > > > > be? > > > > > > When and where are the instances created? > > > > > > 2. In the `MetricGroup` class you support adding tags to one > single > > > > > > metric. In my opinion, tags should be added to groups, not > metrics. > > > > > Metrics > > > > > > created from the same group should have the same set of tags, so > > the > > > > > > creator of metrics do not need to concern about what tags to add. > > > > > > > > > > > > Guojun Li <[email protected]> 于2023年6月14日周三 07:15写道: > > > > > > > > > > > >> Yes, thanks Caizhi, it looks very nice. > > > > > >> > > > > > >> I'm going to make the metrics name more reasonable. Metrics > group > > is > > > > > also > > > > > >> a > > > > > >> good idea to take into account. > > > > > >> > > > > > >> On Thu, Jun 8, 2023 at 4:50 PM Caizhi Weng < > [email protected]> > > > > > wrote: > > > > > >> > > > > > >> > Hi Guojun! > > > > > >> > > > > > > >> > I've read the modified document. I still have the following > > > > concerns: > > > > > >> > > > > > > >> > 1. Some metric names are still misleading. For example, > > > > > >> "totalTablesFiles" > > > > > >> > is the "number of total data files currently maintained on > > > storage", > > > > > >> > however "numTotalManifests" is the "number of scanned > manifests > > > > files > > > > > in > > > > > >> > the last scan planning", not the total number of manifest > files > > on > > > > > >> storage. > > > > > >> > Please keep metric names consistent, even across different > > metric > > > > > types. > > > > > >> > > > > > > >> > 2. There is no grouping in the current design. Let's consider > > all > > > > > >> metrics > > > > > >> > related to compaction. As one parallelism can hold multiple > > > writers, > > > > > it > > > > > >> is > > > > > >> > very likely that compactions for different buckets may happen > in > > > the > > > > > >> same > > > > > >> > parallelism. How are you going to distinguish the same > > compaction > > > > > metric > > > > > >> > for different buckets? Flink has metric groups which separate > > > > > different > > > > > >> > subtasks of the same operator. > > > > > >> > > > > > > >> > Guojun Li <[email protected]> 于2023年6月2日周五 18:25写道: > > > > > >> > > > > > > >> > > Thanks caizhi for the detailed review. > > > > > >> > > > > > > > >> > > I've updated the pip-3 based on your comments, they're > useful > > to > > > > > >> improve > > > > > >> > > this proposal and make it clear. > > > > > >> > > 1. I added the Histogram and HistogramStatistic interfaces. > > > > > >> > > 2. Added histogram metrics to measure the distribution of > > > commit / > > > > > >> scan / > > > > > >> > > compaction duration. > > > > > >> > > 3. Update the descriptions for some metrics, make them more > > > clear. > > > > > >> > > > > > > > >> > > And any other ideas from devs? Looking forward to your > > feedback. > > > > > >> > > > > > > > >> > > Best, > > > > > >> > > Guojun > > > > > >> > > > > > > > >> > > On Mon, May 29, 2023 at 1:51 PM Guojun Li < > > > > [email protected]> > > > > > >> > wrote: > > > > > >> > > > > > > > >> > > > Thanks Caizhi for your comments. I'll think about > > introducing > > > > > >> Histogram > > > > > >> > > > metric type and making the metrics description more clear. > > > > > >> > > > > > > > > >> > > > Best, > > > > > >> > > > Guojun > > > > > >> > > > > > > > > >> > > > On Thu, May 25, 2023 at 7:13 PM Caizhi Weng < > > > > [email protected] > > > > > > > > > > > >> > > wrote: > > > > > >> > > > > > > > > >> > > >> I also forgot to mention that Flink metrics [1] is a good > > > > > >> reference. > > > > > >> > It > > > > > >> > > >> clearly describes what Gauge, Counter and Histogram > should > > do > > > > and > > > > > >> has > > > > > >> > a > > > > > >> > > >> list of metrics with clear definition. For example they > > have > > > a > > > > > >> metric > > > > > >> > > >> named > > > > > >> > > >> "lastCheckpointDuration" (which is a Gauge) and not just > > > > > >> > > >> "checkpointDuration". > > > > > >> > > >> > > > > > >> > > >> [1] > > > > > >> > > >> > > > > > >> > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/ > > > > > >> > > >> > > > > > >> > > >> Caizhi Weng <[email protected]> 于2023年5月25日周四 > 19:10写道: > > > > > >> > > >> > > > > > >> > > >> > Hi Guojun! > > > > > >> > > >> > > > > > > >> > > >> > Thanks for raising this discussion. After reading the > > > > document, > > > > > >> I'd > > > > > >> > > like > > > > > >> > > >> > to raise a few opinions. > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > -------------------------------------------------------------------------------- > > > > > >> > > >> > > > > > > >> > > >> > 1. About the overall types of metrics. > > > > > >> > > >> > > > > > > >> > > >> > You suggested two types of metrics, namely Gauge and > > > > Counter. A > > > > > >> > gauge > > > > > >> > > >> > indicates the value of a metric at that instance, for > > > example > > > > > >> "the > > > > > >> > > >> duration > > > > > >> > > >> > of the last compaction". A counter indicates an > > accumulated > > > > > value > > > > > >> > of a > > > > > >> > > >> > metric overtime, for example "the total number of > records > > > > > >> written". > > > > > >> > > >> > > > > > > >> > > >> > However there are times when we want to study the > history > > > and > > > > > see > > > > > >> > the > > > > > >> > > >> > trend. For example if we notice that the last > compaction > > is > > > > > >> slower > > > > > >> > > than > > > > > >> > > >> > normal, we might want to know "the average duration of > > the > > > > last > > > > > >> few > > > > > >> > > >> > compactions" to make sure that this slow down is just > by > > > > chance > > > > > >> or > > > > > >> > is > > > > > >> > > a > > > > > >> > > >> > common case. Both gauge and counter cannot meet this > > need. > > > > What > > > > > >> we > > > > > >> > > need > > > > > >> > > >> is > > > > > >> > > >> > another type of metrics called Histogram. This metrics > > type > > > > > >> should > > > > > >> > be > > > > > >> > > >> able > > > > > >> > > >> > to record the last few values of a metric and calculate > > > their > > > > > >> > > statistics > > > > > >> > > >> > such as average, min/max and more. > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > -------------------------------------------------------------------------------- > > > > > >> > > >> > > > > > > >> > > >> > 2. About the types of some specific metrics. > > > > > >> > > >> > > > > > > >> > > >> > It seems to me that in your proposal, you cannot > clearly > > > tell > > > > > the > > > > > >> > > >> > difference between Gauges and Counters, and sometimes > it > > is > > > > not > > > > > >> > clear > > > > > >> > > >> that > > > > > >> > > >> > what a specific metric means. Let me give you some > > example. > > > > > >> > > >> > > > > > > >> > > >> > Metric Name: commitDuration > > > > > >> > > >> > > > > > > >> > > >> > Description: Commit > > > > > >> > > >> > > > > > > >> > > >> > Type: Gauge > > > > > >> > > >> > > > > > > >> > > >> > Update at: Timer starts before commit starting, update > > > commit > > > > > >> > duration > > > > > >> > > >> >> after commit finished > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > At first glance, "Commit" is not a valid description. > > From > > > > the > > > > > >> > metric > > > > > >> > > >> name > > > > > >> > > >> > I guess you'd like to record the length of duration for > > > > > >> committing > > > > > >> > > >> > snapshots. As its type is a Gauge, I suppose you want > to > > > > record > > > > > >> the > > > > > >> > > >> *last* > > > > > >> > > >> > commit. Please make things clear what exactly you want > to > > > > > record. > > > > > >> > And > > > > > >> > > >> as I > > > > > >> > > >> > said in the last section, it might be better to > > introduce a > > > > > >> > Histogram > > > > > >> > > >> type > > > > > >> > > >> > so that we can study the average duration of the last > few > > > > > >> commits. > > > > > >> > > >> > > > > > > >> > > >> > Metric Name: numTableFilesAdded > > > > > >> > > >> > > > > > > >> > > >> > Description: Number of added table files in this commit > > > > > >> > > >> > > > > > > >> > > >> > Type: Counter > > > > > >> > > >> > > > > > > >> > > >> > Update at: Collecting changes from committables > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > By "this commit" I suppose you mean *last* commit. If > you > > > the > > > > > >> type > > > > > >> > > >> should > > > > > >> > > >> > be a Gauge, not a Counter. By using Counter you may > want > > to > > > > > >> record > > > > > >> > the > > > > > >> > > >> *total > > > > > >> > > >> > number* of files added during all commits. > > > > > >> > > >> > > > > > > >> > > >> > Metric Name: numFilesCompactedBefore > > > > > >> > > >> > > > > > > >> > > >> > Description: Number of deleted files in compaction > > > > > >> > > >> > > > > > > >> > > >> > Type: Counter > > > > > >> > > >> > > > > > > >> > > >> > Update at: Triggering compaction > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > It is also unclear to me what you're going to record. > Are > > > you > > > > > >> going > > > > > >> > > the > > > > > >> > > >> > record the number of deleted files during the *last* > > > > > compaction? > > > > > >> Or > > > > > >> > > >> you'd > > > > > >> > > >> > like to record the *total number* of deleted files > during > > > all > > > > > >> > > >> compaction? > > > > > >> > > >> > > > > > > >> > > >> > I'm afraid most of your proposed metrics seem unclear > to > > > me. > > > > > >> Please > > > > > >> > > >> > rewrite the proposal to make them clear. > > > > > >> > > >> > > > > > > >> > > >> > Guojun Li <[email protected]> 于2023年5月25日周四 > > 16:59写道: > > > > > >> > > >> > > > > > > >> > > >> >> Hi, Paimon Devs, > > > > > >> > > >> >> I’d like to start a discussion about PIP-3[1]. > > > > > >> > > >> >> In this PIP, I'm talking about how to support > > metrics > > > > for > > > > > >> > > paimon, > > > > > >> > > >> >> what > > > > > >> > > >> >> metrics paimon needs. Look forward to your question > and > > > > > >> > suggestions. > > > > > >> > > >> >> > > > > > >> > > >> >> Best, > > > > > >> > > >> >> Guojun > > > > > >> > > >> >> > > > > > >> > > >> >> [1] > > > > > >> > > >> >> > > > > > >> > > >> >> > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon > > > > > >> > > >> >> > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >
