[ https://issues.apache.org/jira/browse/KAFKA-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Arthur reassigned KAFKA-8477: ----------------------------------- Assignee: David Arthur > Cannot consume and request metadata for __consumer_offsets topic in Kafka 2.2 > ----------------------------------------------------------------------------- > > Key: KAFKA-8477 > URL: https://issues.apache.org/jira/browse/KAFKA-8477 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 2.2.0 > Reporter: Mike Mintz > Assignee: David Arthur > Priority: Major > Attachments: kafka-2.2.0-consumer-offset-metadata-bug-master.zip, > logs.txt > > > We have an application that consumes from the __consumer_offsets topic to > report lag metrics. When we upgraded its kafka-clients dependency from 2.0.1 > to 2.2.0, it crashed with: > {noformat} > Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: > Failed to get offsets by times in 30001ms > {noformat} > I created a minimal reproduction at > [https://github.com/mikemintz/kafka-2.2.0-consumer-offset-metadata-bug] and > I'm uploading a zip of this code for posterity. > In particular, the behavior happens when I call KafkaConsumer.assign(), then > poll(), then endOffsets(). This behavior only happens for the > __consumer_offsets topic. It also only happens on the Kafka cluster that we > run in production, which runs Kafka 2.2.0. The error does not occur on a > freshly created Kafka cluster, and I can't get it to reproduce with > EmbeddedKafka. > It works fine with both Kafka 2.0.1 and Kafka 2.1.1, so something broke > between 2.1.1. and 2.2.0. Based on the 2.2.0 changelog and the client log > messages (attached), it looks like it may have been introduced in KAFKA-7738 > (cc [~mumrah]). It gets in a loop, repeating the following block of log > messages: > {noformat} > 2019-06-03 23:24:15 DEBUG NetworkClient:1073 - [Consumer > clientId=test.mikemintz.lag-tracker-reproduce, > groupId=test.mikemintz.lag-tracker-reproduce] Sending metadata request > (type=MetadataRequest, topics=__consumer_offsets) to node REDACTED:9094 (id: > 2134 rack: us-west-2b) > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5862 > to 5862 for partition __consumer_offsets-0 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6040 > to 6040 for partition __consumer_offsets-10 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6008 > to 6008 for partition __consumer_offsets-20 > 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6153 > with new epoch 6152 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5652 > to 5652 for partition __consumer_offsets-30 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6081 > to 6081 for partition __consumer_offsets-39 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5629 > to 5629 for partition __consumer_offsets-9 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5983 > to 5983 for partition __consumer_offsets-11 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896 > to 5896 for partition __consumer_offsets-31 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5278 > to 5278 for partition __consumer_offsets-13 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6026 > to 6026 for partition __consumer_offsets-18 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5608 > to 5608 for partition __consumer_offsets-22 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6025 > to 6025 for partition __consumer_offsets-32 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5685 > to 5685 for partition __consumer_offsets-8 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5730 > to 5730 for partition __consumer_offsets-43 > 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5957 > with new epoch 5956 > 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6047 > with new epoch 6046 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6090 > to 6090 for partition __consumer_offsets-1 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5821 > to 5821 for partition __consumer_offsets-6 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5909 > to 5909 for partition __consumer_offsets-41 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5805 > to 5805 for partition __consumer_offsets-27 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670 > to 5670 for partition __consumer_offsets-48 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6220 > to 6220 for partition __consumer_offsets-5 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5596 > to 5596 for partition __consumer_offsets-15 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896 > to 5896 for partition __consumer_offsets-35 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5858 > to 5858 for partition __consumer_offsets-25 > 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5927 > with new epoch 5926 > 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5938 > with new epoch 5937 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5792 > to 5792 for partition __consumer_offsets-36 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5984 > to 5984 for partition __consumer_offsets-44 > 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5932 > with new epoch 5931 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5979 > to 5979 for partition __consumer_offsets-16 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5804 > to 5804 for partition __consumer_offsets-17 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6179 > to 6179 for partition __consumer_offsets-3 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5953 > to 5953 for partition __consumer_offsets-45 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6276 > to 6276 for partition __consumer_offsets-24 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6119 > to 6119 for partition __consumer_offsets-38 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5784 > to 5784 for partition __consumer_offsets-33 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5665 > to 5665 for partition __consumer_offsets-23 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5812 > to 5812 for partition __consumer_offsets-28 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6014 > to 6014 for partition __consumer_offsets-2 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6199 > to 6199 for partition __consumer_offsets-12 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5908 > to 5908 for partition __consumer_offsets-19 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5697 > to 5697 for partition __consumer_offsets-14 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5673 > to 5673 for partition __consumer_offsets-4 > 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5921 > with new epoch 5920 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5940 > to 5940 for partition __consumer_offsets-49 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5836 > to 5836 for partition __consumer_offsets-42 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5827 > to 5827 for partition __consumer_offsets-7 > 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670 > to 5670 for partition __consumer_offsets-21 > 2019-06-03 23:24:15 DEBUG Metadata:370 - Updated cluster metadata version 293 > to MetadataCache{cluster=Cluster(id = 5Cs2i3WbTzKHbrF2GzLwaQ, nodes = > [REDACTED:9094 (id: 2132 rack: us-west-2a), REDACTED:9094 (id: 2133 rack: > us-west-2a), REDACTED:9094 (id: 2131 rack: us-west-2c), REDACTED:9094 (id: > 2136 rack: us-west-2b), REDACTED:9094 (id: 2135 rack: us-west-2c), > REDACTED:9094 (id: 2134 rack: us-west-2b), REDACTED:9094 (id: 2137 rack: > us-west-2c)], partitions = [Partition(topic = __consumer_offsets, partition = > 13, leader = 2136, replicas = [2136,2131,2132], isr = [2132,2136,2131], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 46, > leader = 2134, replicas = [2134,2133,2135], isr = [2135,2134,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 9, > leader = 2132, replicas = [2132,2134,2131], isr = [2132,2134,2131], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 42, > leader = 2133, replicas = [2133,2136,2135], isr = [2135,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 21, > leader = 2136, replicas = [2136,2131,2133], isr = [2131,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 17, > leader = 2133, replicas = [2133,2136,2131], isr = [2131,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 30, > leader = 2136, replicas = [2136,2133,2131], isr = [2136,2133,2131], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 26, > leader = 2132, replicas = [2132,2134,2135], isr = [2135,2132,2134], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 5, > leader = 2136, replicas = [2136,2131,2132], isr = [2132,2131,2136], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 38, > leader = 2134, replicas = [2134,2133,2135], isr = [2135,2134,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 1, > leader = 2132, replicas = [2132,2134,2131], isr = [2132,2134,2131], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 34, > leader = 2132, replicas = [2132,2136,2135], isr = [2135,2132,2136], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 16, > leader = 2134, replicas = [2134,2132,2135], isr = [2132,2134,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 45, > leader = 2133, replicas = [2133,2136,2135], isr = [2135,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 12, > leader = 2133, replicas = [2133,2134,2131], isr = [2134,2131,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 41, > leader = 2136, replicas = [2136,2133,2135], isr = [2135,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 24, > leader = 2136, replicas = [2136,2133,2131], isr = [2131,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 20, > leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 49, > leader = 2136, replicas = [2136,2132,2135], isr = [2135,2132,2136], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 0, > leader = 2134, replicas = [2134,2132,2135], isr = [2132,2134,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 29, > leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 25, > leader = 2134, replicas = [2134,2135,2133], isr = [2134,2133,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 8, > leader = 2133, replicas = [2133,2136,2131], isr = [2131,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 37, > leader = 2133, replicas = [2133,2136,2135], isr = [2135,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 4, > leader = 2133, replicas = [2133,2134,2131], isr = [2134,2131,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 33, > leader = 2136, replicas = [2136,2133,2135], isr = [2135,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 15, > leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 48, > leader = 2132, replicas = [2132,2134,2135], isr = [2135,2132,2134], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 11, > leader = 2132, replicas = [2132,2135,2136], isr = [2132,2136,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 44, > leader = 2133, replicas = [2133,2134,2135], isr = [2135,2134,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 23, > leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 19, > leader = 2136, replicas = [2136,2133,2131], isr = [2131,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 32, > leader = 2136, replicas = [2136,2131,2133], isr = [2131,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 28, > leader = 2136, replicas = [2136,2133,2135], isr = [2135,2136,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 7, > leader = 2136, replicas = [2136,2132,2131], isr = [2132,2136,2131], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 40, > leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 3, > leader = 2136, replicas = [2136,2133,2135], isr = [2136,2133,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 36, > leader = 2134, replicas = [2134,2133,2135], isr = [2135,2134,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 47, > leader = 2132, replicas = [2132,2134,2135], isr = [2135,2132,2134], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 14, > leader = 2134, replicas = [2134,2135,2132], isr = [2132,2134,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 43, > leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 10, > leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 22, > leader = 2134, replicas = [2134,2135,2133], isr = [2134,2133,2135], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 18, > leader = 2136, replicas = [2136,2131,2132], isr = [2132,2131,2136], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 31, > leader = 2133, replicas = [2133,2134,2135], isr = [2135,2134,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 27, > leader = 2134, replicas = [2134,2133,2131], isr = [2134,2131,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 39, > leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 6, > leader = 2133, replicas = [2133,2134,2131], isr = [2134,2131,2133], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 35, > leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134], > offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 2, > leader = 2134, replicas = [2134,2135,2132], isr = [2132,2134,2135], > offlineReplicas = [])], controller = REDACTED:9094 (id: 2132 rack: > us-west-2a))} > 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer > clientId=test.mikemintz.lag-tracker-reproduce, > groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition > __consumer_offsets-46 is unknown for fetching offset > 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer > clientId=test.mikemintz.lag-tracker-reproduce, > groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition > __consumer_offsets-26 is unknown for fetching offset > 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer > clientId=test.mikemintz.lag-tracker-reproduce, > groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition > __consumer_offsets-34 is unknown for fetching offset > 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer > clientId=test.mikemintz.lag-tracker-reproduce, > groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition > __consumer_offsets-29 is unknown for fetching offset > 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer > clientId=test.mikemintz.lag-tracker-reproduce, > groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition > __consumer_offsets-37 is unknown for fetching offset > 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer > clientId=test.mikemintz.lag-tracker-reproduce, > groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition > __consumer_offsets-40 is unknown for fetching offset > 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer > clientId=test.mikemintz.lag-tracker-reproduce, > groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition > __consumer_offsets-47 is unknown for fetching offset > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)