[ 
https://issues.apache.org/jira/browse/KAFKA-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Arthur reassigned KAFKA-8477:
-----------------------------------

    Assignee: David Arthur

> Cannot consume and request metadata for __consumer_offsets topic in Kafka 2.2
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-8477
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8477
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 2.2.0
>            Reporter: Mike Mintz
>            Assignee: David Arthur
>            Priority: Major
>         Attachments: kafka-2.2.0-consumer-offset-metadata-bug-master.zip, 
> logs.txt
>
>
> We have an application that consumes from the __consumer_offsets topic to 
> report lag metrics. When we upgraded its kafka-clients dependency from 2.0.1 
> to 2.2.0, it crashed with:
> {noformat}
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
> Failed to get offsets by times in 30001ms
> {noformat}
> I created a minimal reproduction at 
> [https://github.com/mikemintz/kafka-2.2.0-consumer-offset-metadata-bug] and 
> I'm uploading a zip of this code for posterity.
> In particular, the behavior happens when I call KafkaConsumer.assign(), then 
> poll(), then endOffsets(). This behavior only happens for the 
> __consumer_offsets topic. It also only happens on the Kafka cluster that we 
> run in production, which runs Kafka 2.2.0. The error does not occur on a 
> freshly created Kafka cluster, and I can't get it to reproduce with 
> EmbeddedKafka.
> It works fine with both Kafka 2.0.1 and Kafka 2.1.1, so something broke 
> between 2.1.1. and 2.2.0. Based on the 2.2.0 changelog and the client log 
> messages (attached), it looks like it may have been introduced in KAFKA-7738 
> (cc [~mumrah]). It gets in a loop, repeating the following block of log 
> messages:
> {noformat}
> 2019-06-03 23:24:15 DEBUG NetworkClient:1073 - [Consumer 
> clientId=test.mikemintz.lag-tracker-reproduce, 
> groupId=test.mikemintz.lag-tracker-reproduce] Sending metadata request 
> (type=MetadataRequest, topics=__consumer_offsets) to node REDACTED:9094 (id: 
> 2134 rack: us-west-2b)
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5862 
> to 5862 for partition __consumer_offsets-0
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6040 
> to 6040 for partition __consumer_offsets-10
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6008 
> to 6008 for partition __consumer_offsets-20
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6153 
> with new epoch 6152
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5652 
> to 5652 for partition __consumer_offsets-30
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6081 
> to 6081 for partition __consumer_offsets-39
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5629 
> to 5629 for partition __consumer_offsets-9
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5983 
> to 5983 for partition __consumer_offsets-11
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896 
> to 5896 for partition __consumer_offsets-31
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5278 
> to 5278 for partition __consumer_offsets-13
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6026 
> to 6026 for partition __consumer_offsets-18
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5608 
> to 5608 for partition __consumer_offsets-22
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6025 
> to 6025 for partition __consumer_offsets-32
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5685 
> to 5685 for partition __consumer_offsets-8
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5730 
> to 5730 for partition __consumer_offsets-43
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5957 
> with new epoch 5956
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6047 
> with new epoch 6046
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6090 
> to 6090 for partition __consumer_offsets-1
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5821 
> to 5821 for partition __consumer_offsets-6
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5909 
> to 5909 for partition __consumer_offsets-41
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5805 
> to 5805 for partition __consumer_offsets-27
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670 
> to 5670 for partition __consumer_offsets-48
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6220 
> to 6220 for partition __consumer_offsets-5
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5596 
> to 5596 for partition __consumer_offsets-15
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896 
> to 5896 for partition __consumer_offsets-35
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5858 
> to 5858 for partition __consumer_offsets-25
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5927 
> with new epoch 5926
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5938 
> with new epoch 5937
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5792 
> to 5792 for partition __consumer_offsets-36
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5984 
> to 5984 for partition __consumer_offsets-44
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5932 
> with new epoch 5931
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5979 
> to 5979 for partition __consumer_offsets-16
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5804 
> to 5804 for partition __consumer_offsets-17
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6179 
> to 6179 for partition __consumer_offsets-3
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5953 
> to 5953 for partition __consumer_offsets-45
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6276 
> to 6276 for partition __consumer_offsets-24
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6119 
> to 6119 for partition __consumer_offsets-38
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5784 
> to 5784 for partition __consumer_offsets-33
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5665 
> to 5665 for partition __consumer_offsets-23
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5812 
> to 5812 for partition __consumer_offsets-28
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6014 
> to 6014 for partition __consumer_offsets-2
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6199 
> to 6199 for partition __consumer_offsets-12
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5908 
> to 5908 for partition __consumer_offsets-19
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5697 
> to 5697 for partition __consumer_offsets-14
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5673 
> to 5673 for partition __consumer_offsets-4
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5921 
> with new epoch 5920
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5940 
> to 5940 for partition __consumer_offsets-49
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5836 
> to 5836 for partition __consumer_offsets-42
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5827 
> to 5827 for partition __consumer_offsets-7
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670 
> to 5670 for partition __consumer_offsets-21
> 2019-06-03 23:24:15 DEBUG Metadata:370 - Updated cluster metadata version 293 
> to MetadataCache{cluster=Cluster(id = 5Cs2i3WbTzKHbrF2GzLwaQ, nodes = 
> [REDACTED:9094 (id: 2132 rack: us-west-2a), REDACTED:9094 (id: 2133 rack: 
> us-west-2a), REDACTED:9094 (id: 2131 rack: us-west-2c), REDACTED:9094 (id: 
> 2136 rack: us-west-2b), REDACTED:9094 (id: 2135 rack: us-west-2c), 
> REDACTED:9094 (id: 2134 rack: us-west-2b), REDACTED:9094 (id: 2137 rack: 
> us-west-2c)], partitions = [Partition(topic = __consumer_offsets, partition = 
> 13, leader = 2136, replicas = [2136,2131,2132], isr = [2132,2136,2131], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 46, 
> leader = 2134, replicas = [2134,2133,2135], isr = [2135,2134,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 9, 
> leader = 2132, replicas = [2132,2134,2131], isr = [2132,2134,2131], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 42, 
> leader = 2133, replicas = [2133,2136,2135], isr = [2135,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 21, 
> leader = 2136, replicas = [2136,2131,2133], isr = [2131,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 17, 
> leader = 2133, replicas = [2133,2136,2131], isr = [2131,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 30, 
> leader = 2136, replicas = [2136,2133,2131], isr = [2136,2133,2131], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 26, 
> leader = 2132, replicas = [2132,2134,2135], isr = [2135,2132,2134], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 5, 
> leader = 2136, replicas = [2136,2131,2132], isr = [2132,2131,2136], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 38, 
> leader = 2134, replicas = [2134,2133,2135], isr = [2135,2134,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 1, 
> leader = 2132, replicas = [2132,2134,2131], isr = [2132,2134,2131], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 34, 
> leader = 2132, replicas = [2132,2136,2135], isr = [2135,2132,2136], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 16, 
> leader = 2134, replicas = [2134,2132,2135], isr = [2132,2134,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 45, 
> leader = 2133, replicas = [2133,2136,2135], isr = [2135,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 12, 
> leader = 2133, replicas = [2133,2134,2131], isr = [2134,2131,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 41, 
> leader = 2136, replicas = [2136,2133,2135], isr = [2135,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 24, 
> leader = 2136, replicas = [2136,2133,2131], isr = [2131,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 20, 
> leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 49, 
> leader = 2136, replicas = [2136,2132,2135], isr = [2135,2132,2136], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 0, 
> leader = 2134, replicas = [2134,2132,2135], isr = [2132,2134,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 29, 
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 25, 
> leader = 2134, replicas = [2134,2135,2133], isr = [2134,2133,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 8, 
> leader = 2133, replicas = [2133,2136,2131], isr = [2131,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 37, 
> leader = 2133, replicas = [2133,2136,2135], isr = [2135,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 4, 
> leader = 2133, replicas = [2133,2134,2131], isr = [2134,2131,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 33, 
> leader = 2136, replicas = [2136,2133,2135], isr = [2135,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 15, 
> leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 48, 
> leader = 2132, replicas = [2132,2134,2135], isr = [2135,2132,2134], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 11, 
> leader = 2132, replicas = [2132,2135,2136], isr = [2132,2136,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 44, 
> leader = 2133, replicas = [2133,2134,2135], isr = [2135,2134,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 23, 
> leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 19, 
> leader = 2136, replicas = [2136,2133,2131], isr = [2131,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 32, 
> leader = 2136, replicas = [2136,2131,2133], isr = [2131,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 28, 
> leader = 2136, replicas = [2136,2133,2135], isr = [2135,2136,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 7, 
> leader = 2136, replicas = [2136,2132,2131], isr = [2132,2136,2131], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 40, 
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 3, 
> leader = 2136, replicas = [2136,2133,2135], isr = [2136,2133,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 36, 
> leader = 2134, replicas = [2134,2133,2135], isr = [2135,2134,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 47, 
> leader = 2132, replicas = [2132,2134,2135], isr = [2135,2132,2134], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 14, 
> leader = 2134, replicas = [2134,2135,2132], isr = [2132,2134,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 43, 
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 10, 
> leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 22, 
> leader = 2134, replicas = [2134,2135,2133], isr = [2134,2133,2135], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 18, 
> leader = 2136, replicas = [2136,2131,2132], isr = [2132,2131,2136], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 31, 
> leader = 2133, replicas = [2133,2134,2135], isr = [2135,2134,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 27, 
> leader = 2134, replicas = [2134,2133,2131], isr = [2134,2131,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 39, 
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 6, 
> leader = 2133, replicas = [2133,2134,2131], isr = [2134,2131,2133], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 35, 
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], 
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 2, 
> leader = 2134, replicas = [2134,2135,2132], isr = [2132,2134,2135], 
> offlineReplicas = [])], controller = REDACTED:9094 (id: 2132 rack: 
> us-west-2a))}
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
> clientId=test.mikemintz.lag-tracker-reproduce, 
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
> __consumer_offsets-46 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
> clientId=test.mikemintz.lag-tracker-reproduce, 
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
> __consumer_offsets-26 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
> clientId=test.mikemintz.lag-tracker-reproduce, 
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
> __consumer_offsets-34 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
> clientId=test.mikemintz.lag-tracker-reproduce, 
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
> __consumer_offsets-29 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
> clientId=test.mikemintz.lag-tracker-reproduce, 
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
> __consumer_offsets-37 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
> clientId=test.mikemintz.lag-tracker-reproduce, 
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
> __consumer_offsets-40 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
> clientId=test.mikemintz.lag-tracker-reproduce, 
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
> __consumer_offsets-47 is unknown for fetching offset
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to