I also forgot to mention that Flink metrics [1] is a good reference. It clearly describes what Gauge, Counter and Histogram should do and has a list of metrics with clear definition. For example they have a metric named "lastCheckpointDuration" (which is a Gauge) and not just "checkpointDuration".
[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/ Caizhi Weng <[email protected]> 于2023年5月25日周四 19:10写道: > Hi Guojun! > > Thanks for raising this discussion. After reading the document, I'd like > to raise a few opinions. > > > -------------------------------------------------------------------------------- > > 1. About the overall types of metrics. > > You suggested two types of metrics, namely Gauge and Counter. A gauge > indicates the value of a metric at that instance, for example "the duration > of the last compaction". A counter indicates an accumulated value of a > metric overtime, for example "the total number of records written". > > However there are times when we want to study the history and see the > trend. For example if we notice that the last compaction is slower than > normal, we might want to know "the average duration of the last few > compactions" to make sure that this slow down is just by chance or is a > common case. Both gauge and counter cannot meet this need. What we need is > another type of metrics called Histogram. This metrics type should be able > to record the last few values of a metric and calculate their statistics > such as average, min/max and more. > > > -------------------------------------------------------------------------------- > > 2. About the types of some specific metrics. > > It seems to me that in your proposal, you cannot clearly tell the > difference between Gauges and Counters, and sometimes it is not clear that > what a specific metric means. Let me give you some example. > > Metric Name: commitDuration > > Description: Commit > > Type: Gauge > > Update at: Timer starts before commit starting, update commit duration >> after commit finished > > > At first glance, "Commit" is not a valid description. From the metric name > I guess you'd like to record the length of duration for committing > snapshots. As its type is a Gauge, I suppose you want to record the *last* > commit. Please make things clear what exactly you want to record. And as I > said in the last section, it might be better to introduce a Histogram type > so that we can study the average duration of the last few commits. > > Metric Name: numTableFilesAdded > > Description: Number of added table files in this commit > > Type: Counter > > Update at: Collecting changes from committables > > > By "this commit" I suppose you mean *last* commit. If you the type should > be a Gauge, not a Counter. By using Counter you may want to record the *total > number* of files added during all commits. > > Metric Name: numFilesCompactedBefore > > Description: Number of deleted files in compaction > > Type: Counter > > Update at: Triggering compaction > > > It is also unclear to me what you're going to record. Are you going the > record the number of deleted files during the *last* compaction? Or you'd > like to record the *total number* of deleted files during all compaction? > > I'm afraid most of your proposed metrics seem unclear to me. Please > rewrite the proposal to make them clear. > > Guojun Li <[email protected]> 于2023年5月25日周四 16:59写道: > >> Hi, Paimon Devs, >> I’d like to start a discussion about PIP-3[1]. >> In this PIP, I'm talking about how to support metrics for paimon, >> what >> metrics paimon needs. Look forward to your question and suggestions. >> >> Best, >> Guojun >> >> [1] >> >> https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon >> >
