[
https://issues.apache.org/jira/browse/KAFKA-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011553#comment-17011553
]
Ismael Juma commented on KAFKA-8477:
------------------------------------
Does this still happen with 2.4.0? There were some fixes in related areas.
> Cannot consume and request metadata for __consumer_offsets topic in Kafka 2.2
> -----------------------------------------------------------------------------
>
> Key: KAFKA-8477
> URL: https://issues.apache.org/jira/browse/KAFKA-8477
> Project: Kafka
> Issue Type: Bug
> Components: clients
> Affects Versions: 2.2.0
> Reporter: Mike Mintz
> Assignee: David Arthur
> Priority: Major
> Attachments: kafka-2.2.0-consumer-offset-metadata-bug-master.zip,
> logs.txt
>
>
> We have an application that consumes from the __consumer_offsets topic to
> report lag metrics. When we upgraded its kafka-clients dependency from 2.0.1
> to 2.2.0, it crashed with:
> {noformat}
> Exception in thread "main" org.apache.kafka.common.errors.TimeoutException:
> Failed to get offsets by times in 30001ms
> {noformat}
> I created a minimal reproduction at
> [https://github.com/mikemintz/kafka-2.2.0-consumer-offset-metadata-bug] and
> I'm uploading a zip of this code for posterity.
> In particular, the behavior happens when I call KafkaConsumer.assign(), then
> poll(), then endOffsets(). This behavior only happens for the
> __consumer_offsets topic. It also only happens on the Kafka cluster that we
> run in production, which runs Kafka 2.2.0. The error does not occur on a
> freshly created Kafka cluster, and I can't get it to reproduce with
> EmbeddedKafka.
> It works fine with both Kafka 2.0.1 and Kafka 2.1.1, so something broke
> between 2.1.1. and 2.2.0. Based on the 2.2.0 changelog and the client log
> messages (attached), it looks like it may have been introduced in KAFKA-7738
> (cc [~mumrah]). It gets in a loop, repeating the following block of log
> messages:
> {noformat}
> 2019-06-03 23:24:15 DEBUG NetworkClient:1073 - [Consumer
> clientId=test.mikemintz.lag-tracker-reproduce,
> groupId=test.mikemintz.lag-tracker-reproduce] Sending metadata request
> (type=MetadataRequest, topics=__consumer_offsets) to node REDACTED:9094 (id:
> 2134 rack: us-west-2b)
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5862
> to 5862 for partition __consumer_offsets-0
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6040
> to 6040 for partition __consumer_offsets-10
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6008
> to 6008 for partition __consumer_offsets-20
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6153
> with new epoch 6152
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5652
> to 5652 for partition __consumer_offsets-30
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6081
> to 6081 for partition __consumer_offsets-39
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5629
> to 5629 for partition __consumer_offsets-9
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5983
> to 5983 for partition __consumer_offsets-11
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896
> to 5896 for partition __consumer_offsets-31
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5278
> to 5278 for partition __consumer_offsets-13
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6026
> to 6026 for partition __consumer_offsets-18
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5608
> to 5608 for partition __consumer_offsets-22
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6025
> to 6025 for partition __consumer_offsets-32
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5685
> to 5685 for partition __consumer_offsets-8
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5730
> to 5730 for partition __consumer_offsets-43
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5957
> with new epoch 5956
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6047
> with new epoch 6046
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6090
> to 6090 for partition __consumer_offsets-1
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5821
> to 5821 for partition __consumer_offsets-6
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5909
> to 5909 for partition __consumer_offsets-41
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5805
> to 5805 for partition __consumer_offsets-27
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670
> to 5670 for partition __consumer_offsets-48
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6220
> to 6220 for partition __consumer_offsets-5
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5596
> to 5596 for partition __consumer_offsets-15
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896
> to 5896 for partition __consumer_offsets-35
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5858
> to 5858 for partition __consumer_offsets-25
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5927
> with new epoch 5926
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5938
> with new epoch 5937
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5792
> to 5792 for partition __consumer_offsets-36
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5984
> to 5984 for partition __consumer_offsets-44
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5932
> with new epoch 5931
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5979
> to 5979 for partition __consumer_offsets-16
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5804
> to 5804 for partition __consumer_offsets-17
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6179
> to 6179 for partition __consumer_offsets-3
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5953
> to 5953 for partition __consumer_offsets-45
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6276
> to 6276 for partition __consumer_offsets-24
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6119
> to 6119 for partition __consumer_offsets-38
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5784
> to 5784 for partition __consumer_offsets-33
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5665
> to 5665 for partition __consumer_offsets-23
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5812
> to 5812 for partition __consumer_offsets-28
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6014
> to 6014 for partition __consumer_offsets-2
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6199
> to 6199 for partition __consumer_offsets-12
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5908
> to 5908 for partition __consumer_offsets-19
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5697
> to 5697 for partition __consumer_offsets-14
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5673
> to 5673 for partition __consumer_offsets-4
> 2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5921
> with new epoch 5920
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5940
> to 5940 for partition __consumer_offsets-49
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5836
> to 5836 for partition __consumer_offsets-42
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5827
> to 5827 for partition __consumer_offsets-7
> 2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670
> to 5670 for partition __consumer_offsets-21
> 2019-06-03 23:24:15 DEBUG Metadata:370 - Updated cluster metadata version 293
> to MetadataCache{cluster=Cluster(id = 5Cs2i3WbTzKHbrF2GzLwaQ, nodes =
> [REDACTED:9094 (id: 2132 rack: us-west-2a), REDACTED:9094 (id: 2133 rack:
> us-west-2a), REDACTED:9094 (id: 2131 rack: us-west-2c), REDACTED:9094 (id:
> 2136 rack: us-west-2b), REDACTED:9094 (id: 2135 rack: us-west-2c),
> REDACTED:9094 (id: 2134 rack: us-west-2b), REDACTED:9094 (id: 2137 rack:
> us-west-2c)], partitions = [Partition(topic = __consumer_offsets, partition =
> 13, leader = 2136, replicas = [2136,2131,2132], isr = [2132,2136,2131],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 46,
> leader = 2134, replicas = [2134,2133,2135], isr = [2135,2134,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 9,
> leader = 2132, replicas = [2132,2134,2131], isr = [2132,2134,2131],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 42,
> leader = 2133, replicas = [2133,2136,2135], isr = [2135,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 21,
> leader = 2136, replicas = [2136,2131,2133], isr = [2131,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 17,
> leader = 2133, replicas = [2133,2136,2131], isr = [2131,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 30,
> leader = 2136, replicas = [2136,2133,2131], isr = [2136,2133,2131],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 26,
> leader = 2132, replicas = [2132,2134,2135], isr = [2135,2132,2134],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 5,
> leader = 2136, replicas = [2136,2131,2132], isr = [2132,2131,2136],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 38,
> leader = 2134, replicas = [2134,2133,2135], isr = [2135,2134,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 1,
> leader = 2132, replicas = [2132,2134,2131], isr = [2132,2134,2131],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 34,
> leader = 2132, replicas = [2132,2136,2135], isr = [2135,2132,2136],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 16,
> leader = 2134, replicas = [2134,2132,2135], isr = [2132,2134,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 45,
> leader = 2133, replicas = [2133,2136,2135], isr = [2135,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 12,
> leader = 2133, replicas = [2133,2134,2131], isr = [2134,2131,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 41,
> leader = 2136, replicas = [2136,2133,2135], isr = [2135,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 24,
> leader = 2136, replicas = [2136,2133,2131], isr = [2131,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 20,
> leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 49,
> leader = 2136, replicas = [2136,2132,2135], isr = [2135,2132,2136],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 0,
> leader = 2134, replicas = [2134,2132,2135], isr = [2132,2134,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 29,
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 25,
> leader = 2134, replicas = [2134,2135,2133], isr = [2134,2133,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 8,
> leader = 2133, replicas = [2133,2136,2131], isr = [2131,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 37,
> leader = 2133, replicas = [2133,2136,2135], isr = [2135,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 4,
> leader = 2133, replicas = [2133,2134,2131], isr = [2134,2131,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 33,
> leader = 2136, replicas = [2136,2133,2135], isr = [2135,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 15,
> leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 48,
> leader = 2132, replicas = [2132,2134,2135], isr = [2135,2132,2134],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 11,
> leader = 2132, replicas = [2132,2135,2136], isr = [2132,2136,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 44,
> leader = 2133, replicas = [2133,2134,2135], isr = [2135,2134,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 23,
> leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 19,
> leader = 2136, replicas = [2136,2133,2131], isr = [2131,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 32,
> leader = 2136, replicas = [2136,2131,2133], isr = [2131,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 28,
> leader = 2136, replicas = [2136,2133,2135], isr = [2135,2136,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 7,
> leader = 2136, replicas = [2136,2132,2131], isr = [2132,2136,2131],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 40,
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 3,
> leader = 2136, replicas = [2136,2133,2135], isr = [2136,2133,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 36,
> leader = 2134, replicas = [2134,2133,2135], isr = [2135,2134,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 47,
> leader = 2132, replicas = [2132,2134,2135], isr = [2135,2132,2134],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 14,
> leader = 2134, replicas = [2134,2135,2132], isr = [2132,2134,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 43,
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 10,
> leader = 2132, replicas = [2132,2136,2135], isr = [2132,2136,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 22,
> leader = 2134, replicas = [2134,2135,2133], isr = [2134,2133,2135],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 18,
> leader = 2136, replicas = [2136,2131,2132], isr = [2132,2131,2136],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 31,
> leader = 2133, replicas = [2133,2134,2135], isr = [2135,2134,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 27,
> leader = 2134, replicas = [2134,2133,2131], isr = [2134,2131,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 39,
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 6,
> leader = 2133, replicas = [2133,2134,2131], isr = [2134,2131,2133],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 35,
> leader = 2134, replicas = [2134,2132,2135], isr = [2135,2132,2134],
> offlineReplicas = []), Partition(topic = __consumer_offsets, partition = 2,
> leader = 2134, replicas = [2134,2135,2132], isr = [2132,2134,2135],
> offlineReplicas = [])], controller = REDACTED:9094 (id: 2132 rack:
> us-west-2a))}
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
> clientId=test.mikemintz.lag-tracker-reproduce,
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
> __consumer_offsets-46 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
> clientId=test.mikemintz.lag-tracker-reproduce,
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
> __consumer_offsets-26 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
> clientId=test.mikemintz.lag-tracker-reproduce,
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
> __consumer_offsets-34 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
> clientId=test.mikemintz.lag-tracker-reproduce,
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
> __consumer_offsets-29 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
> clientId=test.mikemintz.lag-tracker-reproduce,
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
> __consumer_offsets-37 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
> clientId=test.mikemintz.lag-tracker-reproduce,
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
> __consumer_offsets-40 is unknown for fetching offset
> 2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
> clientId=test.mikemintz.lag-tracker-reproduce,
> groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
> __consumer_offsets-47 is unknown for fetching offset
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)