Matt Venz created KAFKA-20370:
---------------------------------
Summary: BrokerRegistrationState per-broker metric not registered
after snapshot load
Key: KAFKA-20370
URL: https://issues.apache.org/jira/browse/KAFKA-20370
Project: Kafka
Issue Type: Bug
Components: controller
Affects Versions: 4.1.1
Reporter: Matt Venz
h2. Description
The per-broker
{{kafka.controller:type=KafkaController,name=BrokerRegistrationState,broker=X}}
metric introduced in KIP-1131 is never exposed via JMX for brokers that are
loaded from a metadata snapshot.
h2. Root cause
{{ControllerMetadataMetricsPublisher.publishSnapshot()}} sets the aggregate
counts ({{FencedBrokerCount}}, {{ActiveBrokerCount}},
{{ControlledShutdownBrokerCount}}) but never calls
{{ControllerMetadataMetrics.addBrokerRegistrationStateMetric()}} for each
broker in the snapshot.
The per-broker JMX gauge is only registered in
{{ControllerMetricsChanges.handleBrokerChange()}} when {{prev == null}} (brand
new broker registration via a log delta). For brokers already present in the
snapshot, {{prev}} is always non-null in subsequent deltas, so the gauge is
never created. {{setBrokerRegistrationState()}} updates the internal
{{ConcurrentHashMap}} but there is no corresponding JMX gauge to report the
value.
Since controllers always load from a snapshot on startup (or leadership
change), the metric is effectively unavailable for all pre-existing brokers in
any real-world deployment.
h2. Steps to reproduce
# Start a KRaft cluster with 3+ brokers and 3 controllers
# On the active controller, query JMX:
{code}
kafka.controller:type=KafkaController,name=BrokerRegistrationState,*
{code}
# No metrics are returned, even though {{ControlledShutdownBrokerCount}} (set
in {{publishSnapshot}}) is present
h2. Expected behavior
After loading a snapshot, the active controller should expose
{{BrokerRegistrationState,broker=X}} for every registered broker.
h2. Suggested fix
In {{ControllerMetadataMetricsPublisher.publishSnapshot()}}, add per-broker
metric registration and state initialization:
{code:java}
private void publishSnapshot(MetadataImage newImage) {
// ... existing topic count logic ...
int fencedBrokers = 0;
int activeBrokers = 0;
int controlledShutdownBrokers = 0;
for (BrokerRegistration broker : newImage.cluster().brokers().values()) {
metrics.addBrokerRegistrationStateMetric(broker.id()); // <--
missing
metrics.setBrokerRegistrationState(broker.id(), broker); // <--
missing
if (broker.fenced()) {
fencedBrokers++;
} else {
activeBrokers++;
}
if (broker.inControlledShutdown()) {
controlledShutdownBrokers++;
}
}
// ... rest unchanged ...
}
{code}
h2. Related
* [KIP-1131: Improved controller-side monitoring of broker
states|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1131%3A+Improved+controller-side+monitoring+of+broker+states]
* [KAFKA-18666: Controller-side monitoring for broker shutdown and
startup|https://issues.apache.org/jira/browse/KAFKA-18666]
* [PR #19586|https://github.com/apache/kafka/pull/19586]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)