Matt Venz created KAFKA-20370:
---------------------------------

             Summary: BrokerRegistrationState per-broker metric not registered 
after snapshot load
                 Key: KAFKA-20370
                 URL: https://issues.apache.org/jira/browse/KAFKA-20370
             Project: Kafka
          Issue Type: Bug
          Components: controller
    Affects Versions: 4.1.1
            Reporter: Matt Venz


h2. Description

The per-broker 
{{kafka.controller:type=KafkaController,name=BrokerRegistrationState,broker=X}} 
metric introduced in KIP-1131 is never exposed via JMX for brokers that are 
loaded from a metadata snapshot.

h2. Root cause

{{ControllerMetadataMetricsPublisher.publishSnapshot()}} sets the aggregate 
counts ({{FencedBrokerCount}}, {{ActiveBrokerCount}}, 
{{ControlledShutdownBrokerCount}}) but never calls 
{{ControllerMetadataMetrics.addBrokerRegistrationStateMetric()}} for each 
broker in the snapshot.

The per-broker JMX gauge is only registered in 
{{ControllerMetricsChanges.handleBrokerChange()}} when {{prev == null}} (brand 
new broker registration via a log delta). For brokers already present in the 
snapshot, {{prev}} is always non-null in subsequent deltas, so the gauge is 
never created. {{setBrokerRegistrationState()}} updates the internal 
{{ConcurrentHashMap}} but there is no corresponding JMX gauge to report the 
value.

Since controllers always load from a snapshot on startup (or leadership 
change), the metric is effectively unavailable for all pre-existing brokers in 
any real-world deployment.

h2. Steps to reproduce

# Start a KRaft cluster with 3+ brokers and 3 controllers
# On the active controller, query JMX:
{code}
kafka.controller:type=KafkaController,name=BrokerRegistrationState,*
{code}
# No metrics are returned, even though {{ControlledShutdownBrokerCount}} (set 
in {{publishSnapshot}}) is present

h2. Expected behavior

After loading a snapshot, the active controller should expose 
{{BrokerRegistrationState,broker=X}} for every registered broker.

h2. Suggested fix

In {{ControllerMetadataMetricsPublisher.publishSnapshot()}}, add per-broker 
metric registration and state initialization:

{code:java}
private void publishSnapshot(MetadataImage newImage) {
    // ... existing topic count logic ...

    int fencedBrokers = 0;
    int activeBrokers = 0;
    int controlledShutdownBrokers = 0;
    for (BrokerRegistration broker : newImage.cluster().brokers().values()) {
        metrics.addBrokerRegistrationStateMetric(broker.id());       // <-- 
missing
        metrics.setBrokerRegistrationState(broker.id(), broker);     // <-- 
missing
        if (broker.fenced()) {
            fencedBrokers++;
        } else {
            activeBrokers++;
        }
        if (broker.inControlledShutdown()) {
            controlledShutdownBrokers++;
        }
    }
    // ... rest unchanged ...
}
{code}

h2. Related

* [KIP-1131: Improved controller-side monitoring of broker 
states|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1131%3A+Improved+controller-side+monitoring+of+broker+states]
* [KAFKA-18666: Controller-side monitoring for broker shutdown and 
startup|https://issues.apache.org/jira/browse/KAFKA-18666]
* [PR #19586|https://github.com/apache/kafka/pull/19586]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to