[ 
https://issues.apache.org/jira/browse/KAFKA-13405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435163#comment-17435163
 ] 

Matthias J. Sax edited comment on KAFKA-13405 at 10/28/21, 5:39 AM:
--------------------------------------------------------------------

Thanks for reporting this issue.
{quote} * None of the original broker IPs are valid/accessible after the 
upgrade (as expected){quote}
Why would this be expected? If you do a rolling upgrade, IP address should stay 
the same?
{quote} * Restart our applications after upgrading our Kafka cluster{quote}
As all IP addresses changed, you cannot simple restart the application without 
updating the `bootstrap.server` config, right? Otherwise, the "main consumers" 
won't be able to connect to the broker cluster either.

The difference between the main consumer and the restore consumer is, that the 
main consumer is always active, and thus during a broker upgrade can update its 
metadata step by step. However, the restore consumer is only used during 
restore (or if you have standby tasks enabled) and thus, does not actively 
communicate with the cluster and thus does not update its metadata. Also note, 
that if your application crashed and would restart, the main consumer (using 
the old `bootstrap.server` config) would also not be able to reconnect to the 
cluster.

Overall, I am not sure if we can resolve this issue, and to some extend it 
seems like a user error: because you changed all IPs, you also need to update 
the `bootstrap.server` config for all clients (which requires a bounce of all 
clients). And as a matter of fact, it seems that updating the config and 
restarting Kafka Streams resolved the issue.

In general, a cluster upgrade should not even change IP addresses though.


was (Author: mjsax):
Thanks for reporting this issue.
{quote} * None of the original broker IPs are valid/accessible after the 
upgrade (as expected){quote}
Why would this be expected? If you do a rolling upgrade, IP address should stay 
the same?
{quote} * Restart our applications after upgrading our Kafka cluster{quote}
As all IP addresses changed, you cannot simple restart the application without 
updating the `bootstrap.server` config, right? Otherwise, the "main consumers" 
won't be able to connect to the broker cluster either.

The difference between the main consumer and the restore consumer is, that the 
main consumer is always active, and thus during a broker upgrade can update its 
metadata step by step. However, the restore consumer is only used during 
restore (or if you have standby tasks enabled) and thus, does not actively 
communicate with the cluster and thus does not update its metadata. Also note, 
that if your application crashed and would restart, the main consumer (using 
the old `bootstrap.server` config) would also not be able to reconnect to the 
cluster.

Overall, I am not sure if we can resolve this issue, and to some extend it 
seems like a user error: because you changed all IPs, you also need to update 
the `bootstrap.server` config for all clients (which requires a bounce of all 
clients). And as a matter of fact, it seems that updating the config and 
restarting Kafka Streams resolved the issue. In general, a cluster upgrade 
should not even change IP addresses though.

> Kafka Streams restore-consumer fails to refresh broker IPs after upgrading 
> Kafka cluster
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13405
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13405
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Daniel O'Halloran
>            Priority: Critical
>         Attachments: KafkaConfig.txt, KafkaLogs.txt
>
>
> *+Description+*
> After upgrading our Kafka clusters from 2.7 to 2.8 the Streams 
> restore-consumers never update their broker IPs.
> The applications continue to process events as normal, until there is a 
> rebalance.
> Once a rebalance occurs the restore consumers attempts to connect to the old 
> brokers IPs indefinitely and the streams tasks never go back into a RUNNING 
> state.
> We were able to replicate this behaviour with kafka-streams client libraries 
> 2.5.1, 2.7.1 and 2.8.0
>  
> *+Steps to reproduce+*
>  # Upgrade brokers from Kafka 2.7 to Kafka 2.8
>  # Ensure old brokers are completely shut down
>  # Trigger a rebalance of a streams application
>  
> *+Expected result+*
>  * Application rebalances as normal
>  
> *+Actual Result+*
>  * Application cannot restore its data
>  * restore consumer tries to connect to old brokers indefinitely
>  
> *+Observations+*
>  * The cluster metadata was updated on all stream consumer threads during the 
> broker upgrade (multiple times) as the new brokers were brought online 
> (corresponding to leader election occurring on the subscribed partitions), 
> however no cluster metadata was logged on the restore-consumer thread.
>  * None of the original broker IPs are valid/accessible after the upgrade (as 
> expected)
>  * No partition rebalance occurs during the kafka upgrade process.
>  * When the first re-balance was triggered after upgrade, the 
> restore-consumer loops failing to connect on each of the 3 original IPs, but 
> none of the new broker IPs.
>  
> *+Workaround+*
>  * Restart our applications after upgrading our Kafka cluster



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to