[
https://issues.apache.org/jira/browse/KAFKA-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roland Sommer updated KAFKA-20295:
----------------------------------
Description:
While upgrading our kafka clusters to new operating systems I switched to
dynamic voter configuration and removed controller instances with
{{/opt/kafka/bin/kafka-metadata-quorum.sh}} and the {{remove-controller}}
subcommand. Inspecting the cluster with {{describe}} only shows the actual
running nodes.
Now during the update to 4.2.0, the final metadata upgrade step complains about
{code:java}
Could not upgrade eligible.leader.replicas.version to 1. The update failed for
all features since the following feature had an error: Invalid update version
29 for feature metadata.version. Controller 351 only supports versions
7-27{code}
with 351 being an ID of an already removed controller. Inspecting a snapshot
with {{/opt/kafka/bin/kafka-metadata-shell.sh}} indeed shows all controller ids
of already removed controllers:
{code:java}
>> ls image/cluster/controllers/
158 206 351 584 611 686 {code}
while other tools only show the expected nodes:
{code:java}
~$ /opt/kafka/bin/kafka-metadata-quorum.sh --bootstrap-controller
localhost:9093 describe --replication --human-readable
NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp
Status
158 2gsvOvnT7urpZcA_-LUy5w 196823524 0 7 ms ago 8 ms ago Leader
611 27Ii-xdAZ7ReQBLsvvJb0A 196823524 0 348 ms ago 348 ms ago Follower
206 Q7X9o3XbKxk_3tz4T8torg 196823524 0 348 ms ago 348 ms ago Follower
226 7n6aedUEuytkqhBnbe7ESw 196823524 0 348 ms ago 348 ms ago Observer
181 tZ17VQ8cYpf7R-LyAQWf2w 196823524 0 349 ms ago 349 ms ago Observer
299 P4qXt3K0G5Qg_7w_UdvaNA 196823524 0 348 ms ago 348 ms ago Observer
290 bA0pqZFsUa45lRTB6bS4bg 196823524 0 348 ms ago 348 ms ago Observer
293 Av_12222lURKVYVt-aNKOQ 196823524 0 348 ms ago 348 ms ago Observer
485 glENIgkIng1MYDF8HxxoDQ 196823524 0 349 ms ago 350 ms ago Observer {code}
Grepping through {{bin/kafka-dump-log.sh --cluster-metadata-decoder}} only
shows the expected three {{REGISTER_CONTROLLER_RECORD}} entries.
Is there any clear path for removing those stale nodes?
was:
While upgrading our kafka clusters to new operating systems I switched to
dynamic voter configuration and removed controller instances with
{{/opt/kafka/bin/kafka-metadata-quorum.sh}} and the {{remove-controller}}
subcommand. Inspecting the cluster with {{describe}} only shows the actual
running nodes.
Now during the update to 4.2.0, the final metadata upgrade step complains about
{code:java}
Could not upgrade eligible.leader.replicas.version to 1. The update failed for
all features since the following feature had an error: Invalid update version
29 for feature metadata.version. Controller 351 only supports versions
7-27{code}
with 351 being an ID of an already removed controller. Inspecting a snapshot
with {{/opt/kafka/bin/kafka-metadata-shell.sh}} indeed shows all controller ids
of already removed controllers:
{code:java}
>> ls image/cluster/controllers/
158 206 351 584 611 686 {code}
while other tools only show the expected nodes:
{code:java}
~$ /opt/kafka/bin/kafka-metadata-quorum.sh --bootstrap-controller
localhost:9093 describe --replication --human-readable
NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp
Status
158 2gsvOvnT7urpZcA_-LUy5w 196823524 0 7 ms ago 8 ms ago Leader
611 27Ii-xdAZ7ReQBLsvvJb0A 196823524 0 348 ms ago 348 ms ago Follower
206 Q7X9o3XbKxk_3tz4T8torg 196823524 0 348 ms ago 348 ms ago Follower
226 7n6aedUEuytkqhBnbe7ESw 196823524 0 348 ms ago 348 ms ago Observer
181 tZ17VQ8cYpf7R-LyAQWf2w 196823524 0 349 ms ago 349 ms ago Observer
299 P4qXt3K0G5Qg_7w_UdvaNA 196823524 0 348 ms ago 348 ms ago Observer
290 bA0pqZFsUa45lRTB6bS4bg 196823524 0 348 ms ago 348 ms ago Observer
293 Av_12222lURKVYVt-aNKOQ 196823524 0 348 ms ago 348 ms ago Observer
485 glENIgkIng1MYDF8HxxoDQ 196823524 0 349 ms ago 350 ms ago Observer {code}
Is there any clear path for removing those stale nodes?
> Removed controllers still in metadata, blocking finalizing upgrade to 4.2.0
> ---------------------------------------------------------------------------
>
> Key: KAFKA-20295
> URL: https://issues.apache.org/jira/browse/KAFKA-20295
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Environment: Kafka 4.2.0 (Scala 2.13) running on Debian Trixie 13.3
> Reporter: Roland Sommer
> Priority: Minor
>
> While upgrading our kafka clusters to new operating systems I switched to
> dynamic voter configuration and removed controller instances with
> {{/opt/kafka/bin/kafka-metadata-quorum.sh}} and the {{remove-controller}}
> subcommand. Inspecting the cluster with {{describe}} only shows the actual
> running nodes.
> Now during the update to 4.2.0, the final metadata upgrade step complains
> about
> {code:java}
> Could not upgrade eligible.leader.replicas.version to 1. The update failed
> for all features since the following feature had an error: Invalid update
> version 29 for feature metadata.version. Controller 351 only supports
> versions 7-27{code}
> with 351 being an ID of an already removed controller. Inspecting a snapshot
> with {{/opt/kafka/bin/kafka-metadata-shell.sh}} indeed shows all controller
> ids of already removed controllers:
> {code:java}
> >> ls image/cluster/controllers/
> 158 206 351 584 611 686 {code}
> while other tools only show the expected nodes:
> {code:java}
> ~$ /opt/kafka/bin/kafka-metadata-quorum.sh --bootstrap-controller
> localhost:9093 describe --replication --human-readable
> NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp
> Status
> 158 2gsvOvnT7urpZcA_-LUy5w 196823524 0 7 ms ago 8 ms ago Leader
> 611 27Ii-xdAZ7ReQBLsvvJb0A 196823524 0 348 ms ago 348 ms ago Follower
> 206 Q7X9o3XbKxk_3tz4T8torg 196823524 0 348 ms ago 348 ms ago Follower
> 226 7n6aedUEuytkqhBnbe7ESw 196823524 0 348 ms ago 348 ms ago Observer
> 181 tZ17VQ8cYpf7R-LyAQWf2w 196823524 0 349 ms ago 349 ms ago Observer
> 299 P4qXt3K0G5Qg_7w_UdvaNA 196823524 0 348 ms ago 348 ms ago Observer
> 290 bA0pqZFsUa45lRTB6bS4bg 196823524 0 348 ms ago 348 ms ago Observer
> 293 Av_12222lURKVYVt-aNKOQ 196823524 0 348 ms ago 348 ms ago Observer
> 485 glENIgkIng1MYDF8HxxoDQ 196823524 0 349 ms ago 350 ms ago Observer {code}
> Grepping through {{bin/kafka-dump-log.sh --cluster-metadata-decoder}} only
> shows the expected three {{REGISTER_CONTROLLER_RECORD}} entries.
> Is there any clear path for removing those stale nodes?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)