Yes, thanks Caizhi, it looks very nice.

I'm going to make the metrics name more reasonable. Metrics group is also a
good idea to take into account.

On Thu, Jun 8, 2023 at 4:50 PM Caizhi Weng <[email protected]> wrote:

> Hi Guojun!
>
> I've read the modified document. I still have the following concerns:
>
> 1. Some metric names are still misleading. For example, "totalTablesFiles"
> is the "number of total data files currently maintained on storage",
> however "numTotalManifests" is the "number of scanned manifests files in
> the last scan planning", not the total number of manifest files on storage.
> Please keep metric names consistent, even across different metric types.
>
> 2. There is no grouping in the current design. Let's consider all metrics
> related to compaction. As one parallelism can hold multiple writers, it is
> very likely that compactions for different buckets may happen in the same
> parallelism. How are you going to distinguish the same compaction metric
> for different buckets? Flink has metric groups which separate different
> subtasks of the same operator.
>
> Guojun Li <[email protected]> 于2023年6月2日周五 18:25写道:
>
> > Thanks caizhi for the detailed review.
> >
> > I've updated the pip-3 based on your comments, they're useful to improve
> > this proposal and make it clear.
> > 1. I added the Histogram and HistogramStatistic interfaces.
> > 2. Added histogram metrics to measure the distribution of commit / scan /
> > compaction duration.
> > 3. Update the descriptions for some metrics, make them more clear.
> >
> > And any other ideas from devs? Looking forward to your feedback.
> >
> > Best,
> > Guojun
> >
> > On Mon, May 29, 2023 at 1:51 PM Guojun Li <[email protected]>
> wrote:
> >
> > > Thanks Caizhi for your comments. I'll think about introducing Histogram
> > > metric type and making the metrics description more clear.
> > >
> > > Best,
> > > Guojun
> > >
> > > On Thu, May 25, 2023 at 7:13 PM Caizhi Weng <[email protected]>
> > wrote:
> > >
> > >> I also forgot to mention that Flink metrics [1] is a good reference.
> It
> > >> clearly describes what Gauge, Counter and Histogram should do and has
> a
> > >> list of metrics with clear definition. For example they have a metric
> > >> named
> > >> "lastCheckpointDuration" (which is a Gauge) and not just
> > >> "checkpointDuration".
> > >>
> > >> [1]
> > >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/
> > >>
> > >> Caizhi Weng <[email protected]> 于2023年5月25日周四 19:10写道:
> > >>
> > >> > Hi Guojun!
> > >> >
> > >> > Thanks for raising this discussion. After reading the document, I'd
> > like
> > >> > to raise a few opinions.
> > >> >
> > >> >
> > >> >
> > >>
> >
> --------------------------------------------------------------------------------
> > >> >
> > >> > 1. About the overall types of metrics.
> > >> >
> > >> > You suggested two types of metrics, namely Gauge and Counter. A
> gauge
> > >> > indicates the value of a metric at that instance, for example "the
> > >> duration
> > >> > of the last compaction". A counter indicates an accumulated value
> of a
> > >> > metric overtime, for example "the total number of records written".
> > >> >
> > >> > However there are times when we want to study the history and see
> the
> > >> > trend. For example if we notice that the last compaction is slower
> > than
> > >> > normal, we might want to know "the average duration of the last few
> > >> > compactions" to make sure that this slow down is just by chance or
> is
> > a
> > >> > common case. Both gauge and counter cannot meet this need. What we
> > need
> > >> is
> > >> > another type of metrics called Histogram. This metrics type should
> be
> > >> able
> > >> > to record the last few values of a metric and calculate their
> > statistics
> > >> > such as average, min/max and more.
> > >> >
> > >> >
> > >> >
> > >>
> >
> --------------------------------------------------------------------------------
> > >> >
> > >> > 2. About the types of some specific metrics.
> > >> >
> > >> > It seems to me that in your proposal, you cannot clearly tell the
> > >> > difference between Gauges and Counters, and sometimes it is not
> clear
> > >> that
> > >> > what a specific metric means. Let me give you some example.
> > >> >
> > >> > Metric Name: commitDuration
> > >> >
> > >> > Description:  Commit
> > >> >
> > >> > Type: Gauge
> > >> >
> > >> > Update at: Timer starts before commit starting, update commit
> duration
> > >> >> after commit finished
> > >> >
> > >> >
> > >> > At first glance, "Commit" is not a valid description. From the
> metric
> > >> name
> > >> > I guess you'd like to record the length of duration for committing
> > >> > snapshots. As its type is a Gauge, I suppose you want to record the
> > >> *last*
> > >> > commit. Please make things clear what exactly you want to record.
> And
> > >> as I
> > >> > said in the last section, it might be better to introduce a
> Histogram
> > >> type
> > >> > so that we can study the average duration of the last few commits.
> > >> >
> > >> > Metric Name: numTableFilesAdded
> > >> >
> > >> > Description: Number of added table files in this commit
> > >> >
> > >> > Type: Counter
> > >> >
> > >> > Update at: Collecting changes from committables
> > >> >
> > >> >
> > >> > By "this commit" I suppose you mean *last* commit. If you the type
> > >> should
> > >> > be a Gauge, not a Counter. By using Counter you may want to record
> the
> > >> *total
> > >> > number* of files added during all commits.
> > >> >
> > >> > Metric Name: numFilesCompactedBefore
> > >> >
> > >> > Description: Number of deleted files in compaction
> > >> >
> > >> > Type: Counter
> > >> >
> > >> > Update at: Triggering compaction
> > >> >
> > >> >
> > >> > It is also unclear to me what you're going to record. Are you going
> > the
> > >> > record the number of deleted files during the *last* compaction? Or
> > >> you'd
> > >> > like to record the *total number* of deleted files during all
> > >> compaction?
> > >> >
> > >> > I'm afraid most of your proposed metrics seem unclear to me. Please
> > >> > rewrite the proposal to make them clear.
> > >> >
> > >> > Guojun Li <[email protected]> 于2023年5月25日周四 16:59写道:
> > >> >
> > >> >> Hi, Paimon Devs,
> > >> >>      I’d like to start a discussion about PIP-3[1].
> > >> >>      In this PIP, I'm talking about how to support metrics for
> > paimon,
> > >> >> what
> > >> >> metrics paimon needs. Look forward to your question and
> suggestions.
> > >> >>
> > >> >> Best,
> > >> >> Guojun
> > >> >>
> > >> >> [1]
> > >> >>
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon
> > >> >>
> > >> >
> > >>
> > >
> >
>

Reply via email to