Hi! 1. Could you add some demonstration code about how to create and use a metric? You CommitMetrics code example uses Metrics#gauge method, but Metrics class does not have such method.
2. You've added Metrics#registerMetrics(String name, Metric metric, int groupIndex), which seems really weird to me. When registering a metric we have to remember the index of the group? Why not providing the group directly, or just register the metric in the group? Caizhi Weng <[email protected]> 于2023年6月16日周五 16:01写道: > Hi Guojun! > > I see that you've submitted the 3rd edition of PIP-3. Most problems have > been resolved, however I still have these doubts: > > 1. Is `Metrics` class a singleton? If yes, it should contain multiple > `MetricGroup` as its member. Currently each `Metrics` class has only one > `MetricGroup`. If it is not a singleton, how many instances will there be? > When and where are the instances created? > 2. In the `MetricGroup` class you support adding tags to one single > metric. In my opinion, tags should be added to groups, not metrics. Metrics > created from the same group should have the same set of tags, so the > creator of metrics do not need to concern about what tags to add. > > Guojun Li <[email protected]> 于2023年6月14日周三 07:15写道: > >> Yes, thanks Caizhi, it looks very nice. >> >> I'm going to make the metrics name more reasonable. Metrics group is also >> a >> good idea to take into account. >> >> On Thu, Jun 8, 2023 at 4:50 PM Caizhi Weng <[email protected]> wrote: >> >> > Hi Guojun! >> > >> > I've read the modified document. I still have the following concerns: >> > >> > 1. Some metric names are still misleading. For example, >> "totalTablesFiles" >> > is the "number of total data files currently maintained on storage", >> > however "numTotalManifests" is the "number of scanned manifests files in >> > the last scan planning", not the total number of manifest files on >> storage. >> > Please keep metric names consistent, even across different metric types. >> > >> > 2. There is no grouping in the current design. Let's consider all >> metrics >> > related to compaction. As one parallelism can hold multiple writers, it >> is >> > very likely that compactions for different buckets may happen in the >> same >> > parallelism. How are you going to distinguish the same compaction metric >> > for different buckets? Flink has metric groups which separate different >> > subtasks of the same operator. >> > >> > Guojun Li <[email protected]> 于2023年6月2日周五 18:25写道: >> > >> > > Thanks caizhi for the detailed review. >> > > >> > > I've updated the pip-3 based on your comments, they're useful to >> improve >> > > this proposal and make it clear. >> > > 1. I added the Histogram and HistogramStatistic interfaces. >> > > 2. Added histogram metrics to measure the distribution of commit / >> scan / >> > > compaction duration. >> > > 3. Update the descriptions for some metrics, make them more clear. >> > > >> > > And any other ideas from devs? Looking forward to your feedback. >> > > >> > > Best, >> > > Guojun >> > > >> > > On Mon, May 29, 2023 at 1:51 PM Guojun Li <[email protected]> >> > wrote: >> > > >> > > > Thanks Caizhi for your comments. I'll think about introducing >> Histogram >> > > > metric type and making the metrics description more clear. >> > > > >> > > > Best, >> > > > Guojun >> > > > >> > > > On Thu, May 25, 2023 at 7:13 PM Caizhi Weng <[email protected]> >> > > wrote: >> > > > >> > > >> I also forgot to mention that Flink metrics [1] is a good >> reference. >> > It >> > > >> clearly describes what Gauge, Counter and Histogram should do and >> has >> > a >> > > >> list of metrics with clear definition. For example they have a >> metric >> > > >> named >> > > >> "lastCheckpointDuration" (which is a Gauge) and not just >> > > >> "checkpointDuration". >> > > >> >> > > >> [1] >> > > >> >> > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/ >> > > >> >> > > >> Caizhi Weng <[email protected]> 于2023年5月25日周四 19:10写道: >> > > >> >> > > >> > Hi Guojun! >> > > >> > >> > > >> > Thanks for raising this discussion. After reading the document, >> I'd >> > > like >> > > >> > to raise a few opinions. >> > > >> > >> > > >> > >> > > >> > >> > > >> >> > > >> > >> -------------------------------------------------------------------------------- >> > > >> > >> > > >> > 1. About the overall types of metrics. >> > > >> > >> > > >> > You suggested two types of metrics, namely Gauge and Counter. A >> > gauge >> > > >> > indicates the value of a metric at that instance, for example >> "the >> > > >> duration >> > > >> > of the last compaction". A counter indicates an accumulated value >> > of a >> > > >> > metric overtime, for example "the total number of records >> written". >> > > >> > >> > > >> > However there are times when we want to study the history and see >> > the >> > > >> > trend. For example if we notice that the last compaction is >> slower >> > > than >> > > >> > normal, we might want to know "the average duration of the last >> few >> > > >> > compactions" to make sure that this slow down is just by chance >> or >> > is >> > > a >> > > >> > common case. Both gauge and counter cannot meet this need. What >> we >> > > need >> > > >> is >> > > >> > another type of metrics called Histogram. This metrics type >> should >> > be >> > > >> able >> > > >> > to record the last few values of a metric and calculate their >> > > statistics >> > > >> > such as average, min/max and more. >> > > >> > >> > > >> > >> > > >> > >> > > >> >> > > >> > >> -------------------------------------------------------------------------------- >> > > >> > >> > > >> > 2. About the types of some specific metrics. >> > > >> > >> > > >> > It seems to me that in your proposal, you cannot clearly tell the >> > > >> > difference between Gauges and Counters, and sometimes it is not >> > clear >> > > >> that >> > > >> > what a specific metric means. Let me give you some example. >> > > >> > >> > > >> > Metric Name: commitDuration >> > > >> > >> > > >> > Description: Commit >> > > >> > >> > > >> > Type: Gauge >> > > >> > >> > > >> > Update at: Timer starts before commit starting, update commit >> > duration >> > > >> >> after commit finished >> > > >> > >> > > >> > >> > > >> > At first glance, "Commit" is not a valid description. From the >> > metric >> > > >> name >> > > >> > I guess you'd like to record the length of duration for >> committing >> > > >> > snapshots. As its type is a Gauge, I suppose you want to record >> the >> > > >> *last* >> > > >> > commit. Please make things clear what exactly you want to record. >> > And >> > > >> as I >> > > >> > said in the last section, it might be better to introduce a >> > Histogram >> > > >> type >> > > >> > so that we can study the average duration of the last few >> commits. >> > > >> > >> > > >> > Metric Name: numTableFilesAdded >> > > >> > >> > > >> > Description: Number of added table files in this commit >> > > >> > >> > > >> > Type: Counter >> > > >> > >> > > >> > Update at: Collecting changes from committables >> > > >> > >> > > >> > >> > > >> > By "this commit" I suppose you mean *last* commit. If you the >> type >> > > >> should >> > > >> > be a Gauge, not a Counter. By using Counter you may want to >> record >> > the >> > > >> *total >> > > >> > number* of files added during all commits. >> > > >> > >> > > >> > Metric Name: numFilesCompactedBefore >> > > >> > >> > > >> > Description: Number of deleted files in compaction >> > > >> > >> > > >> > Type: Counter >> > > >> > >> > > >> > Update at: Triggering compaction >> > > >> > >> > > >> > >> > > >> > It is also unclear to me what you're going to record. Are you >> going >> > > the >> > > >> > record the number of deleted files during the *last* compaction? >> Or >> > > >> you'd >> > > >> > like to record the *total number* of deleted files during all >> > > >> compaction? >> > > >> > >> > > >> > I'm afraid most of your proposed metrics seem unclear to me. >> Please >> > > >> > rewrite the proposal to make them clear. >> > > >> > >> > > >> > Guojun Li <[email protected]> 于2023年5月25日周四 16:59写道: >> > > >> > >> > > >> >> Hi, Paimon Devs, >> > > >> >> I’d like to start a discussion about PIP-3[1]. >> > > >> >> In this PIP, I'm talking about how to support metrics for >> > > paimon, >> > > >> >> what >> > > >> >> metrics paimon needs. Look forward to your question and >> > suggestions. >> > > >> >> >> > > >> >> Best, >> > > >> >> Guojun >> > > >> >> >> > > >> >> [1] >> > > >> >> >> > > >> >> >> > > >> >> > > >> > >> https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon >> > > >> >> >> > > >> > >> > > >> >> > > > >> > > >> > >> >
