asafm commented on code in PR #22178: URL: https://github.com/apache/pulsar/pull/22178#discussion_r1513136592
########## pip/pip-342 OTel client metrics support.md: ########## @@ -0,0 +1,201 @@ +# PIP 342: Support OpenTelemetry metrics in Pulsar client + +## Motivation + +Current support for metric instrumentation in Pulsar client is very limited and poses a lot of +issues for integrating the metrics into any telemetry system. + +We have 2 ways that metrics are exposed today: + +1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for + any application to get the data or use it in any meaningful way. +2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to + the rate of events in the last 1-minute interval. This is problematic because out of the + box the metrics are not collected anywhere. One would have to start its own thread to + periodically check these values and export them to some other system. + +Neither of these mechanism that we have today are sufficient to enable application to easily +export the telemetry data of Pulsar client SDK. + +## Goal + +Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation, +in particular with respect to: + +1. Maximizing compatibility with existing telemetry systems +2. Minimizing the effort required to export these metrics + +## Why OpenTelemetry? + +[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and +tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md), +we are already migrating the Pulsar server side metrics to use OpenTelemetry. + +For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and +export Pulsar metrics. + +### Why exposing OpenTelemetry directly in Pulsar API + +When deciding how to expose the metrics exporter configuration there are multiple options: + + 1. Accept an `OpenTelemetry` object directly in Pulsar API + 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to + provide an implementation, perhaps providing an OpenTelemetry included option. + +For this proposal, we are following the (1) option. Here are the reasons: + + 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API + on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new + Pulsar-specific interface + 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its + API. Applications that are going to use OpenTelemetry, will include the OTel SDK + 3. Providing a custom interface has several drawbacks: + 1. Applications need to update their implementations every time a new metric is added in Pulsar SDK + 2. The surface of this plugin API can become quite big when there are several metrics + 3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own + custom way to expose metrics, we can see the level of integration burden that is pushed to application + developers + 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There + are several examples of this in OpenTelemetry documentation. + +## Public API changes + +### Enabling OpenTelemetry + +When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object: + +```java +interface ClientBuilder { + // ... + ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry); + + ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality); +} +``` + +The common usage for an application would be something like: Review Comment: It's worth mentioning that we'll support OTel Java agent (a.k.a automatic instrumentation) users. If they don't specify an OpenTelemetry instance with the builder, we'll default to `GlobalOpenTelemetry.get()`, containing the agent's SDK instance if the agent is used. ########## pip/pip-342 OTel client metrics support.md: ########## @@ -0,0 +1,201 @@ +# PIP 342: Support OpenTelemetry metrics in Pulsar client + +## Motivation + +Current support for metric instrumentation in Pulsar client is very limited and poses a lot of +issues for integrating the metrics into any telemetry system. + +We have 2 ways that metrics are exposed today: + +1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for + any application to get the data or use it in any meaningful way. +2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to + the rate of events in the last 1-minute interval. This is problematic because out of the + box the metrics are not collected anywhere. One would have to start its own thread to + periodically check these values and export them to some other system. + +Neither of these mechanism that we have today are sufficient to enable application to easily +export the telemetry data of Pulsar client SDK. + +## Goal + +Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation, +in particular with respect to: + +1. Maximizing compatibility with existing telemetry systems +2. Minimizing the effort required to export these metrics + +## Why OpenTelemetry? + +[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and +tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md), +we are already migrating the Pulsar server side metrics to use OpenTelemetry. + +For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and +export Pulsar metrics. + +### Why exposing OpenTelemetry directly in Pulsar API + +When deciding how to expose the metrics exporter configuration there are multiple options: + + 1. Accept an `OpenTelemetry` object directly in Pulsar API + 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to + provide an implementation, perhaps providing an OpenTelemetry included option. + +For this proposal, we are following the (1) option. Here are the reasons: + + 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API + on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new + Pulsar-specific interface + 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its + API. Applications that are going to use OpenTelemetry, will include the OTel SDK + 3. Providing a custom interface has several drawbacks: + 1. Applications need to update their implementations every time a new metric is added in Pulsar SDK + 2. The surface of this plugin API can become quite big when there are several metrics + 3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own + custom way to expose metrics, we can see the level of integration burden that is pushed to application + developers + 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There + are several examples of this in OpenTelemetry documentation. + +## Public API changes + +### Enabling OpenTelemetry + +When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object: + +```java +interface ClientBuilder { + // ... + ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry); + + ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality); +} +``` + +The common usage for an application would be something like: + +```java +// Creates a OpenTelemetry instance using environment variables to configure it +OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder() + .build().getOpenTelemetrySdk(); + + PulsarClient client=PulsarClient.builder() + .serviceUrl("pulsar://localhost:6650") + .build(); + +// .... +``` + +Cardinality enum will allow to select a default cardinality label to be attached to the +metrics: + +```java +public enum MetricsCardinality { + /** + * Do not add additional labels to metrics + */ + None, + + /** + * Label metrics by tenant + */ + Tenant, + + /** + * Label metrics by tenant and namespace + */ + Namespace, + + /** + * Label metrics by topic + */ + Topic, + + /** + * Label metrics by each partition + */ + Partition, +} +``` + +The labels are addictive. For example, selecting `Topic` level would mean that the metrics will be +labeled like: + +``` +pulsar_client_received_total{namespace="public/default",tenant="public",topic="persistent://public/default/pt"} 149.0 +``` + +While selecting `Namespace` level: + +``` +pulsar_client_received_total{namespace="public/default",tenant="public"} 149.0 +``` + +### Deprecating the old stats methods + +The old way of collecting stats will be disabled by default, deprecated and eventually removed +in Pulsar 4.0 release. + +Methods to deprecate: + +```java +interface ClientBuilder { + // ... + @Deprecated + ClientBuilder statsInterval(long statsInterval, TimeUnit unit); +} + +interface Producer { + @Deprecated + ProducerStats getStats(); +} + +interface Consumer { + @Deprecated + ConsumerStats getStats(); +} +``` + +## Initial set of metrics to include + +Based on the experience of Pulsar Go client SDK metrics ( +see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go), +this is the proposed initial set of metrics to export. + +Additional metrics could be added later on, though it's better to start with the set of most important metrics +and then evaluate any missing information. + +| OTel metric name | Type | Unit | Description | Review Comment: I commented on metric names in WIP PR, so this table needs to be amended. ########## pip/pip-342 OTel client metrics support.md: ########## @@ -0,0 +1,201 @@ +# PIP 342: Support OpenTelemetry metrics in Pulsar client + +## Motivation + +Current support for metric instrumentation in Pulsar client is very limited and poses a lot of +issues for integrating the metrics into any telemetry system. + +We have 2 ways that metrics are exposed today: + +1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for + any application to get the data or use it in any meaningful way. +2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to + the rate of events in the last 1-minute interval. This is problematic because out of the + box the metrics are not collected anywhere. One would have to start its own thread to + periodically check these values and export them to some other system. + +Neither of these mechanism that we have today are sufficient to enable application to easily +export the telemetry data of Pulsar client SDK. + +## Goal + +Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation, +in particular with respect to: + +1. Maximizing compatibility with existing telemetry systems +2. Minimizing the effort required to export these metrics + +## Why OpenTelemetry? + +[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and +tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md), +we are already migrating the Pulsar server side metrics to use OpenTelemetry. + +For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and +export Pulsar metrics. + +### Why exposing OpenTelemetry directly in Pulsar API + +When deciding how to expose the metrics exporter configuration there are multiple options: + + 1. Accept an `OpenTelemetry` object directly in Pulsar API + 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to + provide an implementation, perhaps providing an OpenTelemetry included option. + +For this proposal, we are following the (1) option. Here are the reasons: + + 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API + on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new + Pulsar-specific interface + 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its + API. Applications that are going to use OpenTelemetry, will include the OTel SDK + 3. Providing a custom interface has several drawbacks: + 1. Applications need to update their implementations every time a new metric is added in Pulsar SDK + 2. The surface of this plugin API can become quite big when there are several metrics + 3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own + custom way to expose metrics, we can see the level of integration burden that is pushed to application + developers + 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There + are several examples of this in OpenTelemetry documentation. + +## Public API changes + +### Enabling OpenTelemetry + +When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object: + +```java +interface ClientBuilder { + // ... + ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry); + + ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality); Review Comment: As discussed, it is preferred to use the experimental view advice to default to only the namespace attribute, and let the user use views to expand the attributes collected to its choosing, while we define the finest granularity. Worth mention it's not dynamic, but I think it's good enough for now. The metric filter developed in OTel designated for server side will be able to serve us here as well. ########## pip/pip-342 OTel client metrics support.md: ########## @@ -0,0 +1,201 @@ +# PIP 342: Support OpenTelemetry metrics in Pulsar client + +## Motivation + +Current support for metric instrumentation in Pulsar client is very limited and poses a lot of +issues for integrating the metrics into any telemetry system. + +We have 2 ways that metrics are exposed today: + +1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for + any application to get the data or use it in any meaningful way. +2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to + the rate of events in the last 1-minute interval. This is problematic because out of the + box the metrics are not collected anywhere. One would have to start its own thread to + periodically check these values and export them to some other system. + +Neither of these mechanism that we have today are sufficient to enable application to easily +export the telemetry data of Pulsar client SDK. + +## Goal + +Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation, +in particular with respect to: + +1. Maximizing compatibility with existing telemetry systems +2. Minimizing the effort required to export these metrics + +## Why OpenTelemetry? + +[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and +tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md), +we are already migrating the Pulsar server side metrics to use OpenTelemetry. + +For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and +export Pulsar metrics. + +### Why exposing OpenTelemetry directly in Pulsar API + +When deciding how to expose the metrics exporter configuration there are multiple options: + + 1. Accept an `OpenTelemetry` object directly in Pulsar API + 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to + provide an implementation, perhaps providing an OpenTelemetry included option. + +For this proposal, we are following the (1) option. Here are the reasons: + + 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API + on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new + Pulsar-specific interface + 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its + API. Applications that are going to use OpenTelemetry, will include the OTel SDK + 3. Providing a custom interface has several drawbacks: + 1. Applications need to update their implementations every time a new metric is added in Pulsar SDK + 2. The surface of this plugin API can become quite big when there are several metrics + 3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own + custom way to expose metrics, we can see the level of integration burden that is pushed to application + developers + 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There + are several examples of this in OpenTelemetry documentation. + +## Public API changes + +### Enabling OpenTelemetry + +When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object: + +```java +interface ClientBuilder { + // ... + ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry); + + ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality); +} +``` + +The common usage for an application would be something like: + +```java +// Creates a OpenTelemetry instance using environment variables to configure it +OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder() + .build().getOpenTelemetrySdk(); + + PulsarClient client=PulsarClient.builder() + .serviceUrl("pulsar://localhost:6650") Review Comment: you're missing a line here. ########## pip/pip-342 OTel client metrics support.md: ########## @@ -0,0 +1,201 @@ +# PIP 342: Support OpenTelemetry metrics in Pulsar client + +## Motivation + +Current support for metric instrumentation in Pulsar client is very limited and poses a lot of +issues for integrating the metrics into any telemetry system. + +We have 2 ways that metrics are exposed today: + +1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for + any application to get the data or use it in any meaningful way. +2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to + the rate of events in the last 1-minute interval. This is problematic because out of the + box the metrics are not collected anywhere. One would have to start its own thread to + periodically check these values and export them to some other system. + +Neither of these mechanism that we have today are sufficient to enable application to easily +export the telemetry data of Pulsar client SDK. + +## Goal + +Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation, +in particular with respect to: + +1. Maximizing compatibility with existing telemetry systems +2. Minimizing the effort required to export these metrics + +## Why OpenTelemetry? + +[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and +tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md), +we are already migrating the Pulsar server side metrics to use OpenTelemetry. + +For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and +export Pulsar metrics. + +### Why exposing OpenTelemetry directly in Pulsar API + +When deciding how to expose the metrics exporter configuration there are multiple options: + + 1. Accept an `OpenTelemetry` object directly in Pulsar API + 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to + provide an implementation, perhaps providing an OpenTelemetry included option. + +For this proposal, we are following the (1) option. Here are the reasons: + + 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API + on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new + Pulsar-specific interface + 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its + API. Applications that are going to use OpenTelemetry, will include the OTel SDK + 3. Providing a custom interface has several drawbacks: + 1. Applications need to update their implementations every time a new metric is added in Pulsar SDK + 2. The surface of this plugin API can become quite big when there are several metrics + 3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own + custom way to expose metrics, we can see the level of integration burden that is pushed to application + developers + 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There + are several examples of this in OpenTelemetry documentation. + +## Public API changes + +### Enabling OpenTelemetry + +When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object: + +```java +interface ClientBuilder { + // ... + ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry); + + ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality); +} +``` + +The common usage for an application would be something like: + +```java +// Creates a OpenTelemetry instance using environment variables to configure it +OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder() + .build().getOpenTelemetrySdk(); + + PulsarClient client=PulsarClient.builder() + .serviceUrl("pulsar://localhost:6650") + .build(); + +// .... +``` + +Cardinality enum will allow to select a default cardinality label to be attached to the +metrics: + +```java +public enum MetricsCardinality { + /** + * Do not add additional labels to metrics + */ + None, + + /** + * Label metrics by tenant + */ + Tenant, + + /** + * Label metrics by tenant and namespace + */ + Namespace, + + /** + * Label metrics by topic + */ + Topic, + + /** + * Label metrics by each partition + */ + Partition, +} +``` + +The labels are addictive. For example, selecting `Topic` level would mean that the metrics will be +labeled like: + +``` +pulsar_client_received_total{namespace="public/default",tenant="public",topic="persistent://public/default/pt"} 149.0 +``` + +While selecting `Namespace` level: + +``` +pulsar_client_received_total{namespace="public/default",tenant="public"} 149.0 +``` + +### Deprecating the old stats methods Review Comment: I suggest 3 phases: 1. OTel is added as experimental, but old stats are active. You can just disable old stats if you want. 2. OTel is no longer experimental, then old stats are marked deprecated. 3. v4 we remove old stats. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
