Hi Asaf,

This is a great topic for discussion, and your document is extremely
thorough! I agree with the general proposal to improve Pulsar's
metrics.

> *Metrics Cardinality: *

+100 if we want to scale Pulsar (and I do!) we need to make this manageable

> *Consolidate into a single library:*

This makes sense to me, and it ensures that new metrics will not be in
one API but not another.

I haven't read the whole doc, but I did read the suggested
improvements. Here are some additional improvements that I've thought
about before.

Are there any metrics we can drop? This would definitely require a
community effort to verify, but I think it could prove valuable.

Can we make the number of histogram buckets configurable? I proposed
this here [0].

Would it be possible to produce a script to help users convert
existing grafana dashboards to work with the new metrics?

Finally, it'd be great to create a metrics section in the contributors
guide when you've completed your work. That will help existing and new
contributors adjust to the new style.

Thanks,
Michael

[0] https://github.com/apache/pulsar/issues/12069

On Mon, Oct 3, 2022 at 3:36 AM Asaf Mesika <asaf.mes...@gmail.com> wrote:
>
> Hi All,
>
> I would like to share with you a document I wrote during the last months
> titled Pulsar Metrics - Current State and Future Directions
> <https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing>,
> and most importantly *get your feedback.*
>
> The initial motivation is to rethink/refactor the way metrics are used in
> Pulsar codebase to solve two large pain points:
>
> 1. *Metrics Cardinality: *As Pulsar can support up to 1M topics
> cluster-wide, this translates into ~100M unique time series, which becomes
> both an impossible cost and affects query performance and general usability
> of metrics. This issue starts surfacing even at 50k-100k topics.
>
> Today users work-around it by disabling topic-granularity metrics and
> scripting their own ETL for generating metrics they can use (based on admin
> stats API), switching between granular topic-level metrics to a group-by
> view of their choosing.
>
> The document outlines a solution built upon the notion of Groups, in which
> users can define a group of metrics, and specify if they wish to define a
> roll-up on it (i.e. remove labels) and filter (i.e. remove specific
> metrics).
> The solution should be able to bring the granularity from topic level (1M)
> to group level (1000).
>
> 2. *Consolidate into a single library:* Today there are 4 different metrics
> libraries/systems in Pulsar. This creates lots of confusion and unhappy
> developer experience, among other impacts. Also achieving (1) requires
> having (2).
>
> The document outlines the different libraries, their functionality and the
> problems they create. The doc also describes one idea for such a library,
> but it still requires a POC.
>
>
> The main goal of the document is mainly to garner feedback to see if the
> directions stipulated there are agreed upon, and if there is any other
> problem missing or existing functionality missed as it serves as the basis
> for the requirements for the solution that will be chosen.
>
> Thanks!
>
> Asaf Mesika
>
> Document link:
> https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing

Reply via email to