Alanxtl opened a new issue, #3337:
URL: https://github.com/apache/dubbo-go/issues/3337

   ## Background
   
   Dubbo-go already has a working metrics pipeline based on the metrics event 
bus, Prometheus registry/exporter, RPC metrics, registry metrics, metadata 
metrics, config-center metrics, and application info metrics. The next step is 
to make these metrics more complete, more stable, and easier to consume in 
production dashboards and alerts.
   
   ## Goals
   
   Improve metrics coverage and standardization so users can rely on a stable 
RED-style observability model across provider, consumer, registry, metadata, 
and config-center scenarios.
   
   ## Proposed Scope
   
   ### 1. Standardize metric names and labels
   
   - Define a documented label contract for built-in metrics.
   - Clarify stable labels such as `side`, `protocol`, `interface`, `method`, 
`group`, `version`, `error_code`, and `error_type`.
   - Review high-cardinality labels and avoid exposing unstable values by 
default.
   - Ensure provider and consumer metrics use symmetric naming where possible.
   
   ### 2. Complete RPC error classification
   
   Current RPC metrics already include granular counters for timeout, limit, 
service unavailable, business failure, and unknown failure. This can be 
improved further:
   
   - Extend error classification beyond Triple/gRPC error codes where possible.
   - Add coverage for Dubbo protocol errors.
   - Distinguish network failure and codec/serialization failure when the 
runtime exposes enough information.
   - Ensure the same error taxonomy is reusable by tracing and logging.
   
   ### 3. Clarify RED metrics support
   
   Provide a clear out-of-the-box model for:
   
   - Rate: request QPS and total request counters.
   - Errors: total failures and categorized failures.
   - Duration: RT, aggregated RT, and quantile metrics.
   
   This should be reflected in code comments, docs, and sample dashboards.
   
   ### 4. Improve component-level metrics layering
   
   Review registry, metadata, and config-center metrics and classify them into:
   
   - Basic metrics enabled by default when metrics are enabled.
   - Detailed metrics that may be enabled explicitly to reduce noise.
   
   Candidate areas:
   
   - Registry register/subscribe/notify/directory metrics.
   - Metadata push/subscribe/store metrics.
   - Config-center change metrics.
   
   ### 5. Align samples and dashboard queries
   
   Update or verify the Prometheus/Grafana sample to use the standardized names 
and labels:
   
   - `dubbo-go-samples/metrics/prometheus_grafana`
   - Grafana panels for QPS, success rate, error rate, P99 latency, timeout 
rate, limit rate, and service unavailable rate.
   - Prometheus query examples in docs.
   
   ## Acceptance Criteria
   
   - Built-in metrics have documented names, labels, and cardinality guidance.
   - RPC failure metrics have a consistent taxonomy shared across provider and 
consumer sides.
   - Existing Prometheus/Grafana sample dashboards continue to work or are 
updated with the new metric contract.
   - Tests cover the standardized metric names/labels and error classification 
behavior.
   - Backward compatibility impact is documented if any metric is renamed or 
deprecated.
   
   ## Related Context
   
   - Existing metrics implementation: `metrics/*`, `filter/metrics`, 
`metrics/prometheus`
   - Existing sample: 
https://github.com/apache/dubbo-go-samples/tree/main/metrics/prometheus_grafana


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to