[ 
https://issues.apache.org/jira/browse/KAFKA-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939861#comment-16939861
 ] 

Sophie Blee-Goldman commented on KAFKA-8649:
--------------------------------------------

[~ferbncode] [~guozhang] [~mjsax] I think I happened across the bug responsible 
for this: consider during the rolling bounce, some members are still on the old 
bytecode (2.0) and subscription version (v3) while others have been upgraded to 
2.1 and v4. If the leader is on the higher version, everyone gets an assignment 
encoded using the min version (v3) but containing the leader's version as v4. 
The members still on 2.0 will see that their used version is less than the 
leader's, and blindly bump it to v4 in `upgradeSubscriptionVersionIfNeeded` – 
then when they try and encode their subscription at the start of the next 
rebalance, this exception is thrown because they don't yet know what v4 is.

Two ideas to fix this:
 # Don't upgrade beyond what you support, and in `onAssignment` do not set the 
version probing code if you were not a "future consumer" aka sent a 
subscription version higher than what the leader supports (this part is 
necessary to avoid getting stuck in a rebalancing loop)
 # Keep track of which consumers sent which versions, and send back an 
assignment using min(consumerVersion, leaderVersion)

> Error while rolling update from Kafka Streams 2.0.0 -> Kafka Streams 2.1.0
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-8649
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8649
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.0.0
>            Reporter: Suyash Garg
>            Priority: Major
>
> While doing a rolling update of a cluster of nodes running Kafka Streams 
> application, the stream threads in the nodes running the old version of the 
> library (2.0.0), fail with the following error: 
> {code:java}
> [ERROR] [application-existing-StreamThread-336] 
> [o.a.k.s.p.internals.StreamThread] - stream-thread 
> [application-existing-StreamThread-336] Encountered the following error 
> during processing:
> java.lang.IllegalArgumentException: version must be between 1 and 3; was: 4
> #011at 
> org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo.<init>(SubscriptionInfo.java:67)
> #011at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.subscription(StreamsPartitionAssignor.java:312)
> #011at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.metadata(ConsumerCoordinator.java:176)
> #011at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest(AbstractCoordinator.java:515)
> #011at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.initiateJoinGroup(AbstractCoordinator.java:466)
> #011at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:412)
> #011at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:352)
> #011at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:337)
> #011at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:333)
> #011at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218)
> #011at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1175)
> #011at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1154)
> #011at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:861)
> #011at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:814)
> #011at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
> #011at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to