Mike Mintz created KAFKA-8477:
---------------------------------
Summary: Cannot consume and request metadata for
__consumer_offsets topic in Kafka 2.2
Key: KAFKA-8477
URL: https://issues.apache.org/jira/browse/KAFKA-8477
Project: Kafka
Issue Type: Bug
Components: clients
Affects Versions: 2.2.0
Reporter: Mike Mintz
Attachments: kafka-2.2.0-consumer-offset-metadata-bug-master.zip,
logs.txt
We have an application that consumes from the __consumer_offsets topic to
report lag metrics. When we upgraded its kafka-clients dependency from 2.0.1 to
2.2.0, it crashed with:
{noformat}
Exception in thread "main" org.apache.kafka.common.errors.TimeoutException:
Failed to get offsets by times in 30001ms
{noformat}
I created a minimal reproduction at
[https://github.com/mikemintz/kafka-2.2.0-consumer-offset-metadata-bug] and I'm
uploading a zip of this code for posterity.
In particular, the behavior happens when I call KafkaConsumer.assign(), then
poll(), then endOffsets(). This behavior only happens for the
__consumer_offsets topic. It also only happens on the Kafka cluster that we run
in production, which runs Kafka 2.2.0. The error does not occur on a freshly
created Kafka cluster, and I can't get it to reproduce with EmbeddedKafka.
It works fine with both Kafka 2.0.1 and Kafka 2.1.1, so something broke between
2.1.1. and 2.2.0. Based on the 2.2.0 changelog and the client log messages
(attached), it looks like it may have been introduced in KAFKA-7738 (cc
[~mumrah]). It gets in a loop, repeating the following block of log messages:
{noformat}
2019-06-03 23:24:15 DEBUG NetworkClient:1073 - [Consumer
clientId=test.mikemintz.lag-tracker-reproduce,
groupId=test.mikemintz.lag-tracker-reproduce] Sending metadata request
(type=MetadataRequest, topics=__consumer_offsets) to node REDACTED:9094 (id:
2134 rack: us-west-2b)
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5862 to
5862 for partition __consumer_offsets-0
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6040 to
6040 for partition __consumer_offsets-10
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6008 to
6008 for partition __consumer_offsets-20
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6153 with
new epoch 6152
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5652 to
5652 for partition __consumer_offsets-30
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6081 to
6081 for partition __consumer_offsets-39
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5629 to
5629 for partition __consumer_offsets-9
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5983 to
5983 for partition __consumer_offsets-11
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896 to
5896 for partition __consumer_offsets-31
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5278 to
5278 for partition __consumer_offsets-13
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6026 to
6026 for partition __consumer_offsets-18
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5608 to
5608 for partition __consumer_offsets-22
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6025 to
6025 for partition __consumer_offsets-32
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5685 to
5685 for partition __consumer_offsets-8
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5730 to
5730 for partition __consumer_offsets-43
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5957 with
new epoch 5956
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6047 with
new epoch 6046
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6090 to
6090 for partition __consumer_offsets-1
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5821 to
5821 for partition __consumer_offsets-6
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5909 to
5909 for partition __consumer_offsets-41
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5805 to
5805 for partition __consumer_offsets-27
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670 to
5670 for partition __consumer_offsets-48
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6220 to
6220 for partition __consumer_offsets-5
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5596 to
5596 for partition __consumer_offsets-15
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896 to
5896 for partition __consumer_offsets-35
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5858 to
5858 for partition __consumer_offsets-25
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5927 with
new epoch 5926
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5938 with
new epoch 5937
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5792 to
5792 for partition __consumer_offsets-36
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5984 to
5984 for partition __consumer_offsets-44
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5932 with
new epoch 5931
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5979 to
5979 for partition __consumer_offsets-16
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5804 to
5804 for partition __consumer_offsets-17
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6179 to
6179 for partition __consumer_offsets-3
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5953 to
5953 for partition __consumer_offsets-45
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6276 to
6276 for partition __consumer_offsets-24
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6119 to
6119 for partition __consumer_offsets-38
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5784 to
5784 for partition __consumer_offsets-33
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5665 to
5665 for partition __consumer_offsets-23
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5812 to
5812 for partition __consumer_offsets-28
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6014 to
6014 for partition __consumer_offsets-2
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6199 to
6199 for partition __consumer_offsets-12
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5908 to
5908 for partition __consumer_offsets-19
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5697 to
5697 for partition __consumer_offsets-14
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5673 to
5673 for partition __consumer_offsets-4
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5921 with
new epoch 5920
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5940 to
5940 for partition __consumer_offsets-49
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5836 to
5836 for partition __consumer_offsets-42
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5827 to
5827 for partition __consumer_offsets-7
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670 to
5670 for partition __consumer_offsets-21
2019-06-03 23:24:15 DEBUG Metadata:370 - Updated cluster metadata version 293
to MetadataCache{cluster=Cluster(id = 5Cs2i3WbTzKHbrF2GzLwaQ, nodes =
[REDACTED:9094 (id: 2132 rack: us-west-2a), REDACTED:9094 (id: 2133 rack:
us-west-2a), REDACTED:9094 (id: 2131 rack: us-west-2c), REDACTED:9094 (id: 2136
rack: us-west-2b), REDACTED:9094 (id: 2135 rack: us-west-2c), REDACTED:9094
(id: 2134 rack: us-west-2b), REDACTED:9094 (id: 2137 rack: us-west-2c)],
partitions = [Partition(topic = __consumer_offsets, partition = 13, leader =
2136, replicas = [2136,2131,2132], isr = [2132,2136,2131], offlineReplicas =
[]), Partition(topic = __consumer_offsets, partition = 46, leader = 2134,
replicas = [2134,2133,2135], isr = [2135,2134,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 9, leader = 2132, replicas =
[2132,2134,2131], isr = [2132,2134,2131], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 42, leader = 2133, replicas =
[2133,2136,2135], isr = [2135,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 21, leader = 2136, replicas =
[2136,2131,2133], isr = [2131,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 17, leader = 2133, replicas =
[2133,2136,2131], isr = [2131,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 30, leader = 2136, replicas =
[2136,2133,2131], isr = [2136,2133,2131], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 26, leader = 2132, replicas =
[2132,2134,2135], isr = [2135,2132,2134], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 5, leader = 2136, replicas =
[2136,2131,2132], isr = [2132,2131,2136], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 38, leader = 2134, replicas =
[2134,2133,2135], isr = [2135,2134,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 1, leader = 2132, replicas =
[2132,2134,2131], isr = [2132,2134,2131], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 34, leader = 2132, replicas =
[2132,2136,2135], isr = [2135,2132,2136], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 16, leader = 2134, replicas =
[2134,2132,2135], isr = [2132,2134,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 45, leader = 2133, replicas =
[2133,2136,2135], isr = [2135,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 12, leader = 2133, replicas =
[2133,2134,2131], isr = [2134,2131,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 41, leader = 2136, replicas =
[2136,2133,2135], isr = [2135,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 24, leader = 2136, replicas =
[2136,2133,2131], isr = [2131,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 20, leader = 2132, replicas =
[2132,2136,2135], isr = [2132,2136,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 49, leader = 2136, replicas =
[2136,2132,2135], isr = [2135,2132,2136], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 0, leader = 2134, replicas =
[2134,2132,2135], isr = [2132,2134,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 29, leader = 2134, replicas =
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 25, leader = 2134, replicas =
[2134,2135,2133], isr = [2134,2133,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 8, leader = 2133, replicas =
[2133,2136,2131], isr = [2131,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 37, leader = 2133, replicas =
[2133,2136,2135], isr = [2135,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 4, leader = 2133, replicas =
[2133,2134,2131], isr = [2134,2131,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 33, leader = 2136, replicas =
[2136,2133,2135], isr = [2135,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 15, leader = 2132, replicas =
[2132,2136,2135], isr = [2132,2136,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 48, leader = 2132, replicas =
[2132,2134,2135], isr = [2135,2132,2134], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 11, leader = 2132, replicas =
[2132,2135,2136], isr = [2132,2136,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 44, leader = 2133, replicas =
[2133,2134,2135], isr = [2135,2134,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 23, leader = 2132, replicas =
[2132,2136,2135], isr = [2132,2136,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 19, leader = 2136, replicas =
[2136,2133,2131], isr = [2131,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 32, leader = 2136, replicas =
[2136,2131,2133], isr = [2131,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 28, leader = 2136, replicas =
[2136,2133,2135], isr = [2135,2136,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 7, leader = 2136, replicas =
[2136,2132,2131], isr = [2132,2136,2131], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 40, leader = 2134, replicas =
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 3, leader = 2136, replicas =
[2136,2133,2135], isr = [2136,2133,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 36, leader = 2134, replicas =
[2134,2133,2135], isr = [2135,2134,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 47, leader = 2132, replicas =
[2132,2134,2135], isr = [2135,2132,2134], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 14, leader = 2134, replicas =
[2134,2135,2132], isr = [2132,2134,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 43, leader = 2134, replicas =
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 10, leader = 2132, replicas =
[2132,2136,2135], isr = [2132,2136,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 22, leader = 2134, replicas =
[2134,2135,2133], isr = [2134,2133,2135], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 18, leader = 2136, replicas =
[2136,2131,2132], isr = [2132,2131,2136], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 31, leader = 2133, replicas =
[2133,2134,2135], isr = [2135,2134,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 27, leader = 2134, replicas =
[2134,2133,2131], isr = [2134,2131,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 39, leader = 2134, replicas =
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 6, leader = 2133, replicas =
[2133,2134,2131], isr = [2134,2131,2133], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 35, leader = 2134, replicas =
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []),
Partition(topic = __consumer_offsets, partition = 2, leader = 2134, replicas =
[2134,2135,2132], isr = [2132,2134,2135], offlineReplicas = [])], controller =
REDACTED:9094 (id: 2132 rack: us-west-2a))}
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
clientId=test.mikemintz.lag-tracker-reproduce,
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
__consumer_offsets-46 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
clientId=test.mikemintz.lag-tracker-reproduce,
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
__consumer_offsets-26 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
clientId=test.mikemintz.lag-tracker-reproduce,
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
__consumer_offsets-34 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
clientId=test.mikemintz.lag-tracker-reproduce,
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
__consumer_offsets-29 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
clientId=test.mikemintz.lag-tracker-reproduce,
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
__consumer_offsets-37 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
clientId=test.mikemintz.lag-tracker-reproduce,
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
__consumer_offsets-40 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer
clientId=test.mikemintz.lag-tracker-reproduce,
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition
__consumer_offsets-47 is unknown for fetching offset
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)