I also forgot to mention that Flink metrics [1] is a good reference. It
clearly describes what Gauge, Counter and Histogram should do and has a
list of metrics with clear definition. For example they have a metric named
"lastCheckpointDuration" (which is a Gauge) and not just
"checkpointDuration".

[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/

Caizhi Weng <[email protected]> 于2023年5月25日周四 19:10写道:

> Hi Guojun!
>
> Thanks for raising this discussion. After reading the document, I'd like
> to raise a few opinions.
>
>
> --------------------------------------------------------------------------------
>
> 1. About the overall types of metrics.
>
> You suggested two types of metrics, namely Gauge and Counter. A gauge
> indicates the value of a metric at that instance, for example "the duration
> of the last compaction". A counter indicates an accumulated value of a
> metric overtime, for example "the total number of records written".
>
> However there are times when we want to study the history and see the
> trend. For example if we notice that the last compaction is slower than
> normal, we might want to know "the average duration of the last few
> compactions" to make sure that this slow down is just by chance or is a
> common case. Both gauge and counter cannot meet this need. What we need is
> another type of metrics called Histogram. This metrics type should be able
> to record the last few values of a metric and calculate their statistics
> such as average, min/max and more.
>
>
> --------------------------------------------------------------------------------
>
> 2. About the types of some specific metrics.
>
> It seems to me that in your proposal, you cannot clearly tell the
> difference between Gauges and Counters, and sometimes it is not clear that
> what a specific metric means. Let me give you some example.
>
> Metric Name: commitDuration
>
> Description:  Commit
>
> Type: Gauge
>
> Update at: Timer starts before commit starting, update commit duration
>> after commit finished
>
>
> At first glance, "Commit" is not a valid description. From the metric name
> I guess you'd like to record the length of duration for committing
> snapshots. As its type is a Gauge, I suppose you want to record the *last*
> commit. Please make things clear what exactly you want to record. And as I
> said in the last section, it might be better to introduce a Histogram type
> so that we can study the average duration of the last few commits.
>
> Metric Name: numTableFilesAdded
>
> Description: Number of added table files in this commit
>
> Type: Counter
>
> Update at: Collecting changes from committables
>
>
> By "this commit" I suppose you mean *last* commit. If you the type should
> be a Gauge, not a Counter. By using Counter you may want to record the *total
> number* of files added during all commits.
>
> Metric Name: numFilesCompactedBefore
>
> Description: Number of deleted files in compaction
>
> Type: Counter
>
> Update at: Triggering compaction
>
>
> It is also unclear to me what you're going to record. Are you going the
> record the number of deleted files during the *last* compaction? Or you'd
> like to record the *total number* of deleted files during all compaction?
>
> I'm afraid most of your proposed metrics seem unclear to me. Please
> rewrite the proposal to make them clear.
>
> Guojun Li <[email protected]> 于2023年5月25日周四 16:59写道:
>
>> Hi, Paimon Devs,
>>      I’d like to start a discussion about PIP-3[1].
>>      In this PIP, I'm talking about how to support metrics for paimon,
>> what
>> metrics paimon needs. Look forward to your question and suggestions.
>>
>> Best,
>> Guojun
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon
>>
>

Reply via email to