This is an automated email from the ASF dual-hosted git repository.
technoboy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new e466f453ebb [improve] [pip] PIP-382: Add a label named reason for
topic_load_failed_total (#23351)
e466f453ebb is described below
commit e466f453ebbc3fa1999ca6acad708731deb067b6
Author: fengyubiao <[email protected]>
AuthorDate: Fri Aug 29 18:34:23 2025 +0800
[improve] [pip] PIP-382: Add a label named reason for
topic_load_failed_total (#23351)
---
pip/pip-382.md | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/pip/pip-382.md b/pip/pip-382.md
new file mode 100644
index 00000000000..adc6e636fe4
--- /dev/null
+++ b/pip/pip-382.md
@@ -0,0 +1,48 @@
+# PIP-382: Add a label named reason for topic_load_failed_total
+
+# Background knowledge
+
+Pulsar has a metric that indicates load topic failed:
`topic_load_failed_total`, it will be increased at the following cases
+- The target bundle in unloading.
+- Failed to load policies.
+- Failed to load up Managed Ledger.
+- Failed to read Metadata store.
+- Topic initialize fails, such as failed to re-build deduplication info.
+- Topic load timeout.
+- Others.
+
+# Motivation & Goals
+
+Adding an additional label of the metric `topic_load_failed_total` may let us
know what error happened fastly, so we can fix the issue fastly.
+
+### Metrics
+
+Add a label named reason for topic_load_failed_total
+- label name: `reason`
+- label values:
+ - `bundle_unloading`
+ - `failed_load_policies`
+ - `failed_load_ml`
+ - `failed_access_metadata_store`
+ - `failed_init`
+ - `timeout`
+ - `others`
+
+
+# Monitoring & Alternatives
+
+- If the value of label value `reason = bundle_unloading` increases a moment,
and it stop to increase after a while, it means everything is fine.
+ - Otherwise, the load-balancer may encounter an error.
+- If the value of label value `reason = timeout` increases a moment, and it
stops to increase after a while, it means too many topics were loaded at the
same time, it may be okay.
+ - Otherwise, broker may encounter a deadlock issue, or the resources is not
enough for the current use case.
+- For other label values, it means something is not expected, and we can apart
them by the label value.
+
+# General Notes
+
+# Links
+
+<!--
+Updated afterwards
+-->
+* Mailing List discussion
thread:https://lists.apache.org/thread/f3xhmm342jor042n5ykkxoc32ffcn85s
+* Mailing List voting thread:
https://lists.apache.org/thread/ng6z0dssjh1hgp91f590wkcl2ymhvn48