[
https://issues.apache.org/jira/browse/KAFKA-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084111#comment-18084111
]
Roland Sommer edited comment on KAFKA-20295 at 5/28/26 1:24 PM:
----------------------------------------------------------------
Just tried if this has been somehow fixed in 4.3.0, but kafka still insists
that the non-existent controller(s) are still part of the cluster:
{code:java}
~$ /opt/kafka/bin/kafka-features.sh --bootstrap-server localhost:9092 upgrade
--release-version 4.3
Could not upgrade eligible.leader.replicas.version to 1. The update failed for
all features since the following feature had an error: Invalid update version
30 for feature metadata.version. Controller 351 only supports versions 7-27
Could not upgrade group.version to 1. The update failed for all features since
the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade kraft.version to 1. The update failed for all features since
the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade metadata.version to 30. The update failed for all features
since the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade share.version to 1. The update failed for all features since
the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade streams.version to 1. The update failed for all features
since the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade transaction.version to 2. The update failed for all features
since the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
7 out of 7 operation(s) failed. {code}
The mentioned controller 351 is not part of the cluster:
{code:java}
~$ /opt/kafka/bin/kafka-metadata-quorum.sh --bootstrap-server localhost:9092
describe --replication
NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp
LastCaughtUpTimestamp Status
158 2gsvOvnT7urpZcA_-LUy5w 210340245 0 1779974359863
1779974359863 Leader
611 27Ii-xdAZ7ReQBLsvvJb0A 210340245 0 1779974359368
1779974359368 Follower
206 Q7X9o3XbKxk_3tz4T8torg 210340245 0 1779974359369
1779974359369 Follower
226 7n6aedUEuytkqhBnbe7ESw 210340245 0 1779974359368
1779974359368 Observer
181 tZ17VQ8cYpf7R-LyAQWf2w 210340245 0 1779974359368
1779974359368 Observer
299 P4qXt3K0G5Qg_7w_UdvaNA 210340245 0 1779974359368
1779974359368 Observer
290 bA0pqZFsUa45lRTB6bS4bg 210340245 0 1779974359368
1779974359368 Observer
293 Av_12222lURKVYVt-aNKOQ 210340245 0 1779974359368
1779974359368 Observer
485 glENIgkIng1MYDF8HxxoDQ 210340245 0 1779974359368
1779974359368 Observer {code}
was (Author: JIRAUSER301730):
Just tried if this has been somehow fixed in 4.3.0, but kafka still insists
that the non-existent controller(s) are still part of the cluster:
{code:java}
~$ /opt/kafka/bin/kafka-features.sh --bootstrap-server localhost:9092 upgrade
--release-version 4.3
Could not upgrade eligible.leader.replicas.version to 1. The update failed for
all features since the following feature had an error: Invalid update version
30 for feature metadata.version. Controller 351 only supports versions 7-27
Could not upgrade group.version to 1. The update failed for all features since
the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade kraft.version to 1. The update failed for all features since
the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade metadata.version to 30. The update failed for all features
since the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade share.version to 1. The update failed for all features since
the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade streams.version to 1. The update failed for all features
since the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
Could not upgrade transaction.version to 2. The update failed for all features
since the following feature had an error: Invalid update version 30 for feature
metadata.version. Controller 351 only supports versions 7-27
7 out of 7 operation(s) failed. {code}
The mentioned controller is not part of the cluster:
{code:java}
~$ /opt/kafka/bin/kafka-metadata-quorum.sh --bootstrap-server localhost:9092
describe --replication
NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp
LastCaughtUpTimestamp Status
158 2gsvOvnT7urpZcA_-LUy5w 210340245 0 1779974359863
1779974359863 Leader
611 27Ii-xdAZ7ReQBLsvvJb0A 210340245 0 1779974359368
1779974359368 Follower
206 Q7X9o3XbKxk_3tz4T8torg 210340245 0 1779974359369
1779974359369 Follower
226 7n6aedUEuytkqhBnbe7ESw 210340245 0 1779974359368
1779974359368 Observer
181 tZ17VQ8cYpf7R-LyAQWf2w 210340245 0 1779974359368
1779974359368 Observer
299 P4qXt3K0G5Qg_7w_UdvaNA 210340245 0 1779974359368
1779974359368 Observer
290 bA0pqZFsUa45lRTB6bS4bg 210340245 0 1779974359368
1779974359368 Observer
293 Av_12222lURKVYVt-aNKOQ 210340245 0 1779974359368
1779974359368 Observer
485 glENIgkIng1MYDF8HxxoDQ 210340245 0 1779974359368
1779974359368 Observer {code}
> Removed controllers still in metadata, blocking finalizing upgrade to 4.2.0
> ---------------------------------------------------------------------------
>
> Key: KAFKA-20295
> URL: https://issues.apache.org/jira/browse/KAFKA-20295
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Environment: Kafka 4.2.0 (Scala 2.13) running on Debian Trixie 13.3
> Reporter: Roland Sommer
> Priority: Major
>
> While upgrading our kafka clusters to new operating systems I switched to
> dynamic voter configuration and removed controller instances with
> {{/opt/kafka/bin/kafka-metadata-quorum.sh}} and the {{remove-controller}}
> subcommand. Inspecting the cluster with {{describe}} only shows the actual
> running nodes.
> Now during the update to 4.2.0, the final metadata upgrade step complains
> about
> {code:java}
> Could not upgrade eligible.leader.replicas.version to 1. The update failed
> for all features since the following feature had an error: Invalid update
> version 29 for feature metadata.version. Controller 351 only supports
> versions 7-27{code}
> with 351 being an ID of an already removed controller. Inspecting a snapshot
> with {{/opt/kafka/bin/kafka-metadata-shell.sh}} indeed shows all controller
> ids of already removed controllers:
> {code:java}
> >> ls image/cluster/controllers/
> 158 206 351 584 611 686 {code}
> while other tools only show the expected nodes:
> {code:java}
> ~$ /opt/kafka/bin/kafka-metadata-quorum.sh --bootstrap-controller
> localhost:9093 describe --replication --human-readable
> NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp
> Status
> 158 2gsvOvnT7urpZcA_-LUy5w 196823524 0 7 ms ago 8 ms ago Leader
> 611 27Ii-xdAZ7ReQBLsvvJb0A 196823524 0 348 ms ago 348 ms ago Follower
> 206 Q7X9o3XbKxk_3tz4T8torg 196823524 0 348 ms ago 348 ms ago Follower
> 226 7n6aedUEuytkqhBnbe7ESw 196823524 0 348 ms ago 348 ms ago Observer
> 181 tZ17VQ8cYpf7R-LyAQWf2w 196823524 0 349 ms ago 349 ms ago Observer
> 299 P4qXt3K0G5Qg_7w_UdvaNA 196823524 0 348 ms ago 348 ms ago Observer
> 290 bA0pqZFsUa45lRTB6bS4bg 196823524 0 348 ms ago 348 ms ago Observer
> 293 Av_12222lURKVYVt-aNKOQ 196823524 0 348 ms ago 348 ms ago Observer
> 485 glENIgkIng1MYDF8HxxoDQ 196823524 0 349 ms ago 350 ms ago Observer {code}
> Grepping through {{bin/kafka-dump-log.sh --cluster-metadata-decoder}} only
> shows the expected three {{REGISTER_CONTROLLER_RECORD}} entries.
> Is there any clear path for removing those stale nodes?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)