Hi Guojun, Thanks for the clarification.
Best Regards Jing On Tue, Jun 27, 2023 at 9:19 PM Guojun Li <[email protected]> wrote: > Hi Jing, > > > 1. What is the relationship between Metrics and ***Metrics? > > Yes, you're right. Metrics is used to store the reporters and metric > groups, and is responsible for reporting metrics to the external backend > periodically. ***Metrics is a grouping of corresponding metric set, we'll > define and init the metrics in it. > > > 2. There is no group hierarchy, i.e. sub-groups. Will it be supported > later or will we stick to the flat group design in the future? > > In my opinion, the flat metric group is enough for Paimon. I don't see the > need for current designed Paimon metrics, like CommitMetrics / ScanMetrics > / CompactionMetrics are flat, which is not complicated as Flink metrics > system, so we simplified the design of MetricGroup. How do you think? > > Best, > Guojun > > On Mon, Jun 26, 2023 at 11:24 PM jing ge <[email protected]> wrote: > > > Hi Guojun, > > > > Sorry for the late reply and thanks for the proposal. The PIP looks good > to > > me. I just have two questions to make sure we are on the same page. > > > > 1. What is the relationship between Metrics and ***Metrics? It seems that > > e.g. CommitMetrics is only responsible for built-in metrics write/init > > while Metrics is used for read and update access centrally for all > > metricGroups and metrics, i.e. getter methods. Did I understand > correctly? > > 2. Current group design is flat. There is no group hierarchy, i.e. > > sub-groups. Will it be supported later or will we stick to the flat group > > design in the future? > > > > Best Regards > > Jing > > > > > > On Sun, Jun 25, 2023 at 10:33 AM Guojun Li <[email protected]> > > wrote: > > > > > Hi, devs, > > > > > > Thank you for your feedback on PIP-3[1], I'd like to start a vote on it > > > since the discussion on this thread[2] came to an agreement. > > > > > > The vote will last for at least 72 hours (Jun 29th, 2:00 GMT, excluding > > > weekend days) unless there is an objection or insufficient vote. > > > > > > Best, > > > Guojun > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon > > > [2]https://lists.apache.org/thread/h2lgqgrvcptzj3c8q4cjzv5jopgtwx9o > > > > > > On Wed, Jun 21, 2023 at 5:53 PM Caizhi Weng <[email protected]> > > wrote: > > > > > > > Hi Guojun! > > > > > > > > I've checked the latest PIP-3 and it now looks fine to me overall. > > > Although > > > > there are still some minor issues, for example it is unclear to me > how > > > > MetricGroup is created and is tagged. But this detail can be further > > > > discussed during the implementation. From my point of view this PIP > is > > > now > > > > fine enough to go on a vote. > > > > > > > > Guojun Li <[email protected]> 于2023年6月21日周三 15:23写道: > > > > > > > > > Hi caizhi, > > > > > > > > > > I've updated APIs in PIP3: > > > > > 1. Instance of Metrics class is a singleton which maintains the > > > reporters > > > > > and Metric groups. > > > > > 2. Instantiation of MetricGroup will add the current group to > Metrics > > > > > instance. > > > > > 3. The metrics list instance, like CommitMetrics instance, will > > > maintain > > > > > the corresponding metricGroup, and can register metrics by the > > gauge(), > > > > > counter() ... method. > > > > > > > > > > Best, > > > > > Guojun > > > > > > > > > > On Mon, Jun 19, 2023 at 11:30 AM Caizhi Weng <[email protected] > > > > > > wrote: > > > > > > > > > > > Hi! > > > > > > > > > > > > 1. Could you add some demonstration code about how to create and > > use > > > a > > > > > > metric? You CommitMetrics code example uses Metrics#gauge method, > > but > > > > > > Metrics class does not have such method. > > > > > > > > > > > > 2. You've added Metrics#registerMetrics(String name, Metric > metric, > > > int > > > > > > groupIndex), which seems really weird to me. When registering a > > > metric > > > > we > > > > > > have to remember the index of the group? Why not providing the > > group > > > > > > directly, or just register the metric in the group? > > > > > > > > > > > > > > > > > > Caizhi Weng <[email protected]> 于2023年6月16日周五 16:01写道: > > > > > > > > > > > > > Hi Guojun! > > > > > > > > > > > > > > I see that you've submitted the 3rd edition of PIP-3. Most > > problems > > > > > have > > > > > > > been resolved, however I still have these doubts: > > > > > > > > > > > > > > 1. Is `Metrics` class a singleton? If yes, it should contain > > > multiple > > > > > > > `MetricGroup` as its member. Currently each `Metrics` class has > > > only > > > > > one > > > > > > > `MetricGroup`. If it is not a singleton, how many instances > will > > > > there > > > > > > be? > > > > > > > When and where are the instances created? > > > > > > > 2. In the `MetricGroup` class you support adding tags to one > > single > > > > > > > metric. In my opinion, tags should be added to groups, not > > metrics. > > > > > > Metrics > > > > > > > created from the same group should have the same set of tags, > so > > > the > > > > > > > creator of metrics do not need to concern about what tags to > add. > > > > > > > > > > > > > > Guojun Li <[email protected]> 于2023年6月14日周三 07:15写道: > > > > > > > > > > > > > >> Yes, thanks Caizhi, it looks very nice. > > > > > > >> > > > > > > >> I'm going to make the metrics name more reasonable. Metrics > > group > > > is > > > > > > also > > > > > > >> a > > > > > > >> good idea to take into account. > > > > > > >> > > > > > > >> On Thu, Jun 8, 2023 at 4:50 PM Caizhi Weng < > > [email protected]> > > > > > > wrote: > > > > > > >> > > > > > > >> > Hi Guojun! > > > > > > >> > > > > > > > >> > I've read the modified document. I still have the following > > > > > concerns: > > > > > > >> > > > > > > > >> > 1. Some metric names are still misleading. For example, > > > > > > >> "totalTablesFiles" > > > > > > >> > is the "number of total data files currently maintained on > > > > storage", > > > > > > >> > however "numTotalManifests" is the "number of scanned > > manifests > > > > > files > > > > > > in > > > > > > >> > the last scan planning", not the total number of manifest > > files > > > on > > > > > > >> storage. > > > > > > >> > Please keep metric names consistent, even across different > > > metric > > > > > > types. > > > > > > >> > > > > > > > >> > 2. There is no grouping in the current design. Let's > consider > > > all > > > > > > >> metrics > > > > > > >> > related to compaction. As one parallelism can hold multiple > > > > writers, > > > > > > it > > > > > > >> is > > > > > > >> > very likely that compactions for different buckets may > happen > > in > > > > the > > > > > > >> same > > > > > > >> > parallelism. How are you going to distinguish the same > > > compaction > > > > > > metric > > > > > > >> > for different buckets? Flink has metric groups which > separate > > > > > > different > > > > > > >> > subtasks of the same operator. > > > > > > >> > > > > > > > >> > Guojun Li <[email protected]> 于2023年6月2日周五 18:25写道: > > > > > > >> > > > > > > > >> > > Thanks caizhi for the detailed review. > > > > > > >> > > > > > > > > >> > > I've updated the pip-3 based on your comments, they're > > useful > > > to > > > > > > >> improve > > > > > > >> > > this proposal and make it clear. > > > > > > >> > > 1. I added the Histogram and HistogramStatistic > interfaces. > > > > > > >> > > 2. Added histogram metrics to measure the distribution of > > > > commit / > > > > > > >> scan / > > > > > > >> > > compaction duration. > > > > > > >> > > 3. Update the descriptions for some metrics, make them > more > > > > clear. > > > > > > >> > > > > > > > > >> > > And any other ideas from devs? Looking forward to your > > > feedback. > > > > > > >> > > > > > > > > >> > > Best, > > > > > > >> > > Guojun > > > > > > >> > > > > > > > > >> > > On Mon, May 29, 2023 at 1:51 PM Guojun Li < > > > > > [email protected]> > > > > > > >> > wrote: > > > > > > >> > > > > > > > > >> > > > Thanks Caizhi for your comments. I'll think about > > > introducing > > > > > > >> Histogram > > > > > > >> > > > metric type and making the metrics description more > clear. > > > > > > >> > > > > > > > > > >> > > > Best, > > > > > > >> > > > Guojun > > > > > > >> > > > > > > > > > >> > > > On Thu, May 25, 2023 at 7:13 PM Caizhi Weng < > > > > > [email protected] > > > > > > > > > > > > > >> > > wrote: > > > > > > >> > > > > > > > > > >> > > >> I also forgot to mention that Flink metrics [1] is a > good > > > > > > >> reference. > > > > > > >> > It > > > > > > >> > > >> clearly describes what Gauge, Counter and Histogram > > should > > > do > > > > > and > > > > > > >> has > > > > > > >> > a > > > > > > >> > > >> list of metrics with clear definition. For example they > > > have > > > > a > > > > > > >> metric > > > > > > >> > > >> named > > > > > > >> > > >> "lastCheckpointDuration" (which is a Gauge) and not > just > > > > > > >> > > >> "checkpointDuration". > > > > > > >> > > >> > > > > > > >> > > >> [1] > > > > > > >> > > >> > > > > > > >> > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/ > > > > > > >> > > >> > > > > > > >> > > >> Caizhi Weng <[email protected]> 于2023年5月25日周四 > > 19:10写道: > > > > > > >> > > >> > > > > > > >> > > >> > Hi Guojun! > > > > > > >> > > >> > > > > > > > >> > > >> > Thanks for raising this discussion. After reading the > > > > > document, > > > > > > >> I'd > > > > > > >> > > like > > > > > > >> > > >> > to raise a few opinions. > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------------------- > > > > > > >> > > >> > > > > > > > >> > > >> > 1. About the overall types of metrics. > > > > > > >> > > >> > > > > > > > >> > > >> > You suggested two types of metrics, namely Gauge and > > > > > Counter. A > > > > > > >> > gauge > > > > > > >> > > >> > indicates the value of a metric at that instance, for > > > > example > > > > > > >> "the > > > > > > >> > > >> duration > > > > > > >> > > >> > of the last compaction". A counter indicates an > > > accumulated > > > > > > value > > > > > > >> > of a > > > > > > >> > > >> > metric overtime, for example "the total number of > > records > > > > > > >> written". > > > > > > >> > > >> > > > > > > > >> > > >> > However there are times when we want to study the > > history > > > > and > > > > > > see > > > > > > >> > the > > > > > > >> > > >> > trend. For example if we notice that the last > > compaction > > > is > > > > > > >> slower > > > > > > >> > > than > > > > > > >> > > >> > normal, we might want to know "the average duration > of > > > the > > > > > last > > > > > > >> few > > > > > > >> > > >> > compactions" to make sure that this slow down is just > > by > > > > > chance > > > > > > >> or > > > > > > >> > is > > > > > > >> > > a > > > > > > >> > > >> > common case. Both gauge and counter cannot meet this > > > need. > > > > > What > > > > > > >> we > > > > > > >> > > need > > > > > > >> > > >> is > > > > > > >> > > >> > another type of metrics called Histogram. This > metrics > > > type > > > > > > >> should > > > > > > >> > be > > > > > > >> > > >> able > > > > > > >> > > >> > to record the last few values of a metric and > calculate > > > > their > > > > > > >> > > statistics > > > > > > >> > > >> > such as average, min/max and more. > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------------------- > > > > > > >> > > >> > > > > > > > >> > > >> > 2. About the types of some specific metrics. > > > > > > >> > > >> > > > > > > > >> > > >> > It seems to me that in your proposal, you cannot > > clearly > > > > tell > > > > > > the > > > > > > >> > > >> > difference between Gauges and Counters, and sometimes > > it > > > is > > > > > not > > > > > > >> > clear > > > > > > >> > > >> that > > > > > > >> > > >> > what a specific metric means. Let me give you some > > > example. > > > > > > >> > > >> > > > > > > > >> > > >> > Metric Name: commitDuration > > > > > > >> > > >> > > > > > > > >> > > >> > Description: Commit > > > > > > >> > > >> > > > > > > > >> > > >> > Type: Gauge > > > > > > >> > > >> > > > > > > > >> > > >> > Update at: Timer starts before commit starting, > update > > > > commit > > > > > > >> > duration > > > > > > >> > > >> >> after commit finished > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > At first glance, "Commit" is not a valid description. > > > From > > > > > the > > > > > > >> > metric > > > > > > >> > > >> name > > > > > > >> > > >> > I guess you'd like to record the length of duration > for > > > > > > >> committing > > > > > > >> > > >> > snapshots. As its type is a Gauge, I suppose you want > > to > > > > > record > > > > > > >> the > > > > > > >> > > >> *last* > > > > > > >> > > >> > commit. Please make things clear what exactly you > want > > to > > > > > > record. > > > > > > >> > And > > > > > > >> > > >> as I > > > > > > >> > > >> > said in the last section, it might be better to > > > introduce a > > > > > > >> > Histogram > > > > > > >> > > >> type > > > > > > >> > > >> > so that we can study the average duration of the last > > few > > > > > > >> commits. > > > > > > >> > > >> > > > > > > > >> > > >> > Metric Name: numTableFilesAdded > > > > > > >> > > >> > > > > > > > >> > > >> > Description: Number of added table files in this > commit > > > > > > >> > > >> > > > > > > > >> > > >> > Type: Counter > > > > > > >> > > >> > > > > > > > >> > > >> > Update at: Collecting changes from committables > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > By "this commit" I suppose you mean *last* commit. If > > you > > > > the > > > > > > >> type > > > > > > >> > > >> should > > > > > > >> > > >> > be a Gauge, not a Counter. By using Counter you may > > want > > > to > > > > > > >> record > > > > > > >> > the > > > > > > >> > > >> *total > > > > > > >> > > >> > number* of files added during all commits. > > > > > > >> > > >> > > > > > > > >> > > >> > Metric Name: numFilesCompactedBefore > > > > > > >> > > >> > > > > > > > >> > > >> > Description: Number of deleted files in compaction > > > > > > >> > > >> > > > > > > > >> > > >> > Type: Counter > > > > > > >> > > >> > > > > > > > >> > > >> > Update at: Triggering compaction > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > >> > It is also unclear to me what you're going to record. > > Are > > > > you > > > > > > >> going > > > > > > >> > > the > > > > > > >> > > >> > record the number of deleted files during the *last* > > > > > > compaction? > > > > > > >> Or > > > > > > >> > > >> you'd > > > > > > >> > > >> > like to record the *total number* of deleted files > > during > > > > all > > > > > > >> > > >> compaction? > > > > > > >> > > >> > > > > > > > >> > > >> > I'm afraid most of your proposed metrics seem unclear > > to > > > > me. > > > > > > >> Please > > > > > > >> > > >> > rewrite the proposal to make them clear. > > > > > > >> > > >> > > > > > > > >> > > >> > Guojun Li <[email protected]> 于2023年5月25日周四 > > > 16:59写道: > > > > > > >> > > >> > > > > > > > >> > > >> >> Hi, Paimon Devs, > > > > > > >> > > >> >> I’d like to start a discussion about PIP-3[1]. > > > > > > >> > > >> >> In this PIP, I'm talking about how to support > > > metrics > > > > > for > > > > > > >> > > paimon, > > > > > > >> > > >> >> what > > > > > > >> > > >> >> metrics paimon needs. Look forward to your question > > and > > > > > > >> > suggestions. > > > > > > >> > > >> >> > > > > > > >> > > >> >> Best, > > > > > > >> > > >> >> Guojun > > > > > > >> > > >> >> > > > > > > >> > > >> >> [1] > > > > > > >> > > >> >> > > > > > > >> > > >> >> > > > > > > >> > > >> > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon > > > > > > >> > > >> >> > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
