[ 
https://issues.apache.org/jira/browse/KAFKA-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587544#comment-16587544
 ] 

John Roesler edited comment on KAFKA-7290 at 8/21/18 2:55 PM:
--------------------------------------------------------------

There's one thread, and two consumers:
{noformat}
2018-08-14 09:08:14 DEBUG KafkaConsumer:759 - Kafka consumer with client id 
firechief-d73ab3d0-2834-4db1-9a4c-f7147ef9d6e4-StreamThread-1-consumer created

2018-08-14 09:08:14 DEBUG KafkaConsumer:759 - Kafka consumer with client id 
firechief-d73ab3d0-2834-4db1-9a4c-f7147ef9d6e4-StreamThread-1-restore-consumer 
created{noformat}
Then, we see this after a while:
{noformat}
2018-08-14 09:08:14 DEBUG NetworkClient:891 - Sending metadata request 
(type=MetadataRequest, topics=<ALL>) to node kafka.broker.internal:9092 (id: -1 
rack: null) 2018-08-14 09:08:14 DEBUG Metadata:252 - Updated cluster metadata 
version 2 to Cluster(id = WQCjF0jUSxKreevGTpU06g, nodes = 
[ip-10-106-12-84.eu-west-1.compute.internal:9092 (id: 3 rack: null), 
ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null), 
ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: null), 
ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: null), 
ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 rack: null)], 
partitions = [Partition(topic = ussenterprise.firesocket.actions.0, partition = 
3, leader = 1, replicas = [1,3,4], isr = [3,1,4]), Partition(topic = 
ussenterprise.firesocket.actions.0, partition = 2, leader = 0, replicas = 
[0,2,3], isr = [0,3,2]), Partition(topic = ussenterprise.firesocket.actions.0, 
partition = 1, leader = 4, replicas = [4,1,2], isr = [1,4,2]), Partition(topic 
= ussenterprise.firesocket.actions.0, partition = 0, leader = 3, replicas = 
[3,0,1], isr = [0,3,1]), Partition(topic = ussenterprise.firesocket.actions.0, 
partition = 6, leader = 3, replicas = [0,2,3], isr = [3,2,0]), Partition(topic 
= ussenterprise.firesocket.actions.0, partition = 5, leader = 3, replicas = 
[3,1,2], isr = [3,1,2]), Partition(topic = ussenterprise.firesocket.actions.0, 
partition = 4, leader = 2, replicas = [0,2,1], isr = [0,2,1])]) 2018-08-14 
09:08:14 DEBUG AbstractCoordinator:593 - Received GroupCoordinator response 
ClientResponse(receivedTimeMs=1534237694641, latencyMs=47, disconnected=false, 
requestHeader={api_key=10,api_version=1,correlation_id=0,client_id=firechief-d73ab3d0-2834-4db1-9a4c-f7147ef9d6e4-StreamThread-1-consumer},
 responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', 
error=NONE, node=ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: 
null))) for group firechief{noformat}
 

Later on, though, there's this funny looking response:
{noformat}
2018-08-14 09:13:14 DEBUG NetworkClient:891 - Sending metadata request 
(type=MetadataRequest, topics=) to node kafka.broker.internal:9092 (id: -1 
rack: null) 2018-08-14 09:13:14 DEBUG Metadata:252 - Updated cluster metadata 
version 1 to Cluster(id = null, nodes = [kafka.broker.internal:9092 (id: -1 
rack: null)], partitions = []) 2018-08-14 09:13:14 DEBUG Metadata:252 - Updated 
cluster metadata version 2 to Cluster(id = WQCjF0jUSxKreevGTpU06g, nodes = 
[ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null), 
ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 rack: null), 
ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: null), 
ip-10-106-12-84.eu-west-1.compute.internal:9092 (id: 3 rack: null), 
ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: null)], partitions 
= [])

2018-08-14 09:13:14 DEBUG Metadata:252 - Updated cluster metadata version 1 to 
Cluster(id = null, nodes = [kafka.broker.internal:9092 (id: -1 rack: null)], 
partitions = [])

(that last message repeats many times)

(then, there's some more activity when node 3 disconnects)

2018-08-14 09:17:17 DEBUG NetworkClient:704 - Node 3 disconnected. 2018-08-14 
09:17:17 DEBUG NetworkClient:907 - Initialize connection to node 
ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null) for sending 
metadata request 2018-08-14 09:17:17 DEBUG NetworkClient:762 - Initiating 
connection to node ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: 
null) 2018-08-14 09:17:17 DEBUG Selector:374 - Created socket with SO_RCVBUF = 
32768, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node 4 2018-08-14 09:17:17 DEBUG 
NetworkClient:724 - Completed connection to node 4. Fetching API versions. 
2018-08-14 09:17:17 DEBUG NetworkClient:738 - Initiating API versions fetch 
from node 4. 2018-08-14 09:17:17 DEBUG NetworkClient:907 - Initialize 
connection to node ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 
rack: null) for sending metadata request 2018-08-14 09:17:17 DEBUG 
NetworkClient:762 - Initiating connection to node 
ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 rack: null) 2018-08-14 
09:17:17 DEBUG Selector:374 - Created socket with SO_RCVBUF = 32768, SO_SNDBUF 
= 131072, SO_TIMEOUT = 0 to node 2 2018-08-14 09:17:17 DEBUG NetworkClient:724 
- Completed connection to node 2. Fetching API versions. 2018-08-14 09:17:17 
DEBUG NetworkClient:738 - Initiating API versions fetch from node 2. 2018-08-14 
09:17:17 DEBUG NetworkClient:907 - Initialize connection to node 
ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: null) for sending 
metadata request 2018-08-14 09:17:17 DEBUG NetworkClient:762 - Initiating 
connection to node ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: 
null) 2018-08-14 09:17:17 DEBUG NetworkClient:691 - Recorded API versions for 
node 4: (Produce(0): 0 to 5 [usable: 3], Fetch(1): 0 to 6 [usable: 5], 
Offsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 5 [usable: 4], 
LeaderAndIsr(4): 0 to 1 [usable: 0], StopReplica(5): 0 [usable: 0], 
UpdateMetadata(6): 0 to 4 [usable: 3], ControlledShutdown(7): 0 to 1 [usable: 
1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 [usable: 3], 
FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 [usable: 2], 
Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 1], 
SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], 
ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 to 1 [usable: 0], 
ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], 
DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], 
InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], 
AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], 
EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], 
TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], 
CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], 
DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0], 
UNKNOWN(34): 0, UNKNOWN(35): 0, UNKNOWN(36): 0, UNKNOWN(37): 0) 2018-08-14 
09:17:17 DEBUG Selector:374 - Created socket with SO_RCVBUF = 32768, SO_SNDBUF 
= 131072, SO_TIMEOUT = 0 to node 0 2018-08-14 09:17:17 DEBUG NetworkClient:691 
- Recorded API versions for node 2: (Produce(0): 0 to 5 [usable: 3], Fetch(1): 
0 to 6 [usable: 5], Offsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 5 
[usable: 4], LeaderAndIsr(4): 0 to 1 [usable: 0], StopReplica(5): 0 [usable: 
0], UpdateMetadata(6): 0 to 4 [usable: 3], ControlledShutdown(7): 0 to 1 
[usable: 1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 
[usable: 3], FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 
[usable: 2], Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 
1], SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], 
ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 to 1 [usable: 0], 
ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], 
DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], 
InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], 
AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], 
EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], 
TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], 
CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], 
DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0], 
UNKNOWN(34): 0, UNKNOWN(35): 0, UNKNOWN(36): 0, UNKNOWN(37): 0) 2018-08-14 
09:17:17 DEBUG NetworkClient:724 - Completed connection to node 0. Fetching API 
versions. 2018-08-14 09:17:17 DEBUG NetworkClient:738 - Initiating API versions 
fetch from node 0. 2018-08-14 09:17:17 DEBUG NetworkClient:907 - Initialize 
connection to node ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 
rack: null) for sending metadata request 2018-08-14 09:17:17 DEBUG 
NetworkClient:762 - Initiating connection to node 
ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: null) 2018-08-14 
09:17:17 DEBUG NetworkClient:891 - Sending metadata request 
(type=MetadataRequest, topics=) to node 
ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null) 2018-08-14 
09:17:17 DEBUG Selector:374 - Created socket with SO_RCVBUF = 32768, SO_SNDBUF 
= 131072, SO_TIMEOUT = 0 to node 1 2018-08-14 09:17:17 DEBUG NetworkClient:724 
- Completed connection to node 1. Fetching API versions. 2018-08-14 09:17:17 
DEBUG NetworkClient:738 - Initiating API versions fetch from node 1. 2018-08-14 
09:17:17 DEBUG NetworkClient:691 - Recorded API versions for node 0: 
(Produce(0): 0 to 5 [usable: 3], Fetch(1): 0 to 6 [usable: 5], Offsets(2): 0 to 
2 [usable: 2], Metadata(3): 0 to 5 [usable: 4], LeaderAndIsr(4): 0 to 1 
[usable: 0], StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 4 [usable: 
3], ControlledShutdown(7): 0 to 1 [usable: 1], OffsetCommit(8): 0 to 3 [usable: 
3], OffsetFetch(9): 0 to 3 [usable: 3], FindCoordinator(10): 0 to 1 [usable: 
1], JoinGroup(11): 0 to 2 [usable: 2], Heartbeat(12): 0 to 1 [usable: 1], 
LeaveGroup(13): 0 to 1 [usable: 1], SyncGroup(14): 0 to 1 [usable: 1], 
DescribeGroups(15): 0 to 1 [usable: 1], ListGroups(16): 0 to 1 [usable: 1], 
SaslHandshake(17): 0 to 1 [usable: 0], ApiVersions(18): 0 to 1 [usable: 1], 
CreateTopics(19): 0 to 2 [usable: 2], DeleteTopics(20): 0 to 1 [usable: 1], 
DeleteRecords(21): 0 [usable: 0], InitProducerId(22): 0 [usable: 0], 
OffsetForLeaderEpoch(23): 0 [usable: 0], AddPartitionsToTxn(24): 0 [usable: 0], 
AddOffsetsToTxn(25): 0 [usable: 0], EndTxn(26): 0 [usable: 0], 
WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 [usable: 0], 
DescribeAcls(29): 0 [usable: 0], CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 
0 [usable: 0], DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 
0], UNKNOWN(34): 0, UNKNOWN(35): 0, UNKNOWN(36): 0, UNKNOWN(37): 0) 2018-08-14 
09:17:17 DEBUG Metadata:252 - Updated cluster metadata version 3 to Cluster(id 
= WQCjF0jUSxKreevGTpU06g, nodes = 
[ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: null), 
ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: null), 
ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 rack: null), 
ip-10-106-12-84.eu-west-1.compute.internal:9092 (id: 3 rack: null), 
ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null)], partitions 
= []) 2018-08-14 09:17:17 DEBUG Metadata:252 - Updated cluster metadata version 
1 to Cluster(id = null, nodes = [kafka.broker.internal:9092 (id: -1 rack: 
null)], partitions = []) 2018-08-14 09:17:17 DEBUG NetworkClient:691 - Recorded 
API versions for node 1: (Produce(0): 0 to 5 [usable: 3], Fetch(1): 0 to 6 
[usable: 5], Offsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 5 [usable: 4], 
LeaderAndIsr(4): 0 to 1 [usable: 0], StopReplica(5): 0 [usable: 0], 
UpdateMetadata(6): 0 to 4 [usable: 3], ControlledShutdown(7): 0 to 1 [usable: 
1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 [usable: 3], 
FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 [usable: 2], 
Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 1], 
SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], 
ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 to 1 [usable: 0], 
ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], 
DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], 
InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], 
AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], 
EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], 
TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], 
CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], 
DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0], 
UNKNOWN(34): 0, UNKNOWN(35): 0, UNKNOWN(36): 0, UNKNOWN(37): 0) 2018-08-14 
09:17:18 DEBUG Metadata:252 - Updated cluster metadata version 1 to Cluster(id 
= null, nodes = [kafka.broker.internal:9092 (id: -1 rack: null)], partitions = 
[])

2018-08-14 09:17:18 DEBUG Metadata:252 - Updated cluster metadata version 1 to 
Cluster(id = null, nodes = [kafka.broker.internal:9092 (id: -1 rack: null)], 
partitions = [])

(that last message repeats until the end of the file){noformat}
 

 

 


was (Author: vvcephei):
There's one thread, and two consumers:

2018-08-14 09:08:14 DEBUG KafkaConsumer:759 - Kafka consumer with client id 
firechief-d73ab3d0-2834-4db1-9a4c-f7147ef9d6e4-StreamThread-1-consumer created

2018-08-14 09:08:14 DEBUG KafkaConsumer:759 - Kafka consumer with client id 
firechief-d73ab3d0-2834-4db1-9a4c-f7147ef9d6e4-StreamThread-1-restore-consumer 
created

 

Then, we see this after a while:

2018-08-14 09:08:14 DEBUG NetworkClient:891 - Sending metadata request 
(type=MetadataRequest, topics=<ALL>) to node kafka.broker.internal:9092 (id: -1 
rack: null) 2018-08-14 09:08:14 DEBUG Metadata:252 - Updated cluster metadata 
version 2 to Cluster(id = WQCjF0jUSxKreevGTpU06g, nodes = 
[ip-10-106-12-84.eu-west-1.compute.internal:9092 (id: 3 rack: null), 
ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null), 
ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: null), 
ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: null), 
ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 rack: null)], 
partitions = [Partition(topic = ussenterprise.firesocket.actions.0, partition = 
3, leader = 1, replicas = [1,3,4], isr = [3,1,4]), Partition(topic = 
ussenterprise.firesocket.actions.0, partition = 2, leader = 0, replicas = 
[0,2,3], isr = [0,3,2]), Partition(topic = ussenterprise.firesocket.actions.0, 
partition = 1, leader = 4, replicas = [4,1,2], isr = [1,4,2]), Partition(topic 
= ussenterprise.firesocket.actions.0, partition = 0, leader = 3, replicas = 
[3,0,1], isr = [0,3,1]), Partition(topic = ussenterprise.firesocket.actions.0, 
partition = 6, leader = 3, replicas = [0,2,3], isr = [3,2,0]), Partition(topic 
= ussenterprise.firesocket.actions.0, partition = 5, leader = 3, replicas = 
[3,1,2], isr = [3,1,2]), Partition(topic = ussenterprise.firesocket.actions.0, 
partition = 4, leader = 2, replicas = [0,2,1], isr = [0,2,1])]) 2018-08-14 
09:08:14 DEBUG AbstractCoordinator:593 - Received GroupCoordinator response 
ClientResponse(receivedTimeMs=1534237694641, latencyMs=47, disconnected=false, 
requestHeader=\{api_key=10,api_version=1,correlation_id=0,client_id=firechief-d73ab3d0-2834-4db1-9a4c-f7147ef9d6e4-StreamThread-1-consumer},
 responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', 
error=NONE, node=ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: 
null))) for group firechief

 

Later on, though, there's this funny looking response:

2018-08-14 09:13:14 DEBUG NetworkClient:891 - Sending metadata request 
(type=MetadataRequest, topics=) to node kafka.broker.internal:9092 (id: -1 
rack: null) 2018-08-14 09:13:14 DEBUG Metadata:252 - Updated cluster metadata 
version 1 to Cluster(id = null, nodes = [kafka.broker.internal:9092 (id: -1 
rack: null)], partitions = []) 2018-08-14 09:13:14 DEBUG Metadata:252 - Updated 
cluster metadata version 2 to Cluster(id = WQCjF0jUSxKreevGTpU06g, nodes = 
[ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null), 
ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 rack: null), 
ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: null), 
ip-10-106-12-84.eu-west-1.compute.internal:9092 (id: 3 rack: null), 
ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: null)], partitions 
= [])

2018-08-14 09:13:14 DEBUG Metadata:252 - Updated cluster metadata version 1 to 
Cluster(id = null, nodes = [kafka.broker.internal:9092 (id: -1 rack: null)], 
partitions = [])

(that last message repeats many times)

(then, there's some more activity when node 3 disconnects)

2018-08-14 09:17:17 DEBUG NetworkClient:704 - Node 3 disconnected. 2018-08-14 
09:17:17 DEBUG NetworkClient:907 - Initialize connection to node 
ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null) for sending 
metadata request 2018-08-14 09:17:17 DEBUG NetworkClient:762 - Initiating 
connection to node ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: 
null) 2018-08-14 09:17:17 DEBUG Selector:374 - Created socket with SO_RCVBUF = 
32768, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node 4 2018-08-14 09:17:17 DEBUG 
NetworkClient:724 - Completed connection to node 4. Fetching API versions. 
2018-08-14 09:17:17 DEBUG NetworkClient:738 - Initiating API versions fetch 
from node 4. 2018-08-14 09:17:17 DEBUG NetworkClient:907 - Initialize 
connection to node ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 
rack: null) for sending metadata request 2018-08-14 09:17:17 DEBUG 
NetworkClient:762 - Initiating connection to node 
ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 rack: null) 2018-08-14 
09:17:17 DEBUG Selector:374 - Created socket with SO_RCVBUF = 32768, SO_SNDBUF 
= 131072, SO_TIMEOUT = 0 to node 2 2018-08-14 09:17:17 DEBUG NetworkClient:724 
- Completed connection to node 2. Fetching API versions. 2018-08-14 09:17:17 
DEBUG NetworkClient:738 - Initiating API versions fetch from node 2. 2018-08-14 
09:17:17 DEBUG NetworkClient:907 - Initialize connection to node 
ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: null) for sending 
metadata request 2018-08-14 09:17:17 DEBUG NetworkClient:762 - Initiating 
connection to node ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: 
null) 2018-08-14 09:17:17 DEBUG NetworkClient:691 - Recorded API versions for 
node 4: (Produce(0): 0 to 5 [usable: 3], Fetch(1): 0 to 6 [usable: 5], 
Offsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 5 [usable: 4], 
LeaderAndIsr(4): 0 to 1 [usable: 0], StopReplica(5): 0 [usable: 0], 
UpdateMetadata(6): 0 to 4 [usable: 3], ControlledShutdown(7): 0 to 1 [usable: 
1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 [usable: 3], 
FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 [usable: 2], 
Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 1], 
SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], 
ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 to 1 [usable: 0], 
ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], 
DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], 
InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], 
AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], 
EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], 
TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], 
CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], 
DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0], 
UNKNOWN(34): 0, UNKNOWN(35): 0, UNKNOWN(36): 0, UNKNOWN(37): 0) 2018-08-14 
09:17:17 DEBUG Selector:374 - Created socket with SO_RCVBUF = 32768, SO_SNDBUF 
= 131072, SO_TIMEOUT = 0 to node 0 2018-08-14 09:17:17 DEBUG NetworkClient:691 
- Recorded API versions for node 2: (Produce(0): 0 to 5 [usable: 3], Fetch(1): 
0 to 6 [usable: 5], Offsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 5 
[usable: 4], LeaderAndIsr(4): 0 to 1 [usable: 0], StopReplica(5): 0 [usable: 
0], UpdateMetadata(6): 0 to 4 [usable: 3], ControlledShutdown(7): 0 to 1 
[usable: 1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 
[usable: 3], FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 
[usable: 2], Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 
1], SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], 
ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 to 1 [usable: 0], 
ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], 
DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], 
InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], 
AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], 
EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], 
TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], 
CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], 
DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0], 
UNKNOWN(34): 0, UNKNOWN(35): 0, UNKNOWN(36): 0, UNKNOWN(37): 0) 2018-08-14 
09:17:17 DEBUG NetworkClient:724 - Completed connection to node 0. Fetching API 
versions. 2018-08-14 09:17:17 DEBUG NetworkClient:738 - Initiating API versions 
fetch from node 0. 2018-08-14 09:17:17 DEBUG NetworkClient:907 - Initialize 
connection to node ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 
rack: null) for sending metadata request 2018-08-14 09:17:17 DEBUG 
NetworkClient:762 - Initiating connection to node 
ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: null) 2018-08-14 
09:17:17 DEBUG NetworkClient:891 - Sending metadata request 
(type=MetadataRequest, topics=) to node 
ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null) 2018-08-14 
09:17:17 DEBUG Selector:374 - Created socket with SO_RCVBUF = 32768, SO_SNDBUF 
= 131072, SO_TIMEOUT = 0 to node 1 2018-08-14 09:17:17 DEBUG NetworkClient:724 
- Completed connection to node 1. Fetching API versions. 2018-08-14 09:17:17 
DEBUG NetworkClient:738 - Initiating API versions fetch from node 1. 2018-08-14 
09:17:17 DEBUG NetworkClient:691 - Recorded API versions for node 0: 
(Produce(0): 0 to 5 [usable: 3], Fetch(1): 0 to 6 [usable: 5], Offsets(2): 0 to 
2 [usable: 2], Metadata(3): 0 to 5 [usable: 4], LeaderAndIsr(4): 0 to 1 
[usable: 0], StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 4 [usable: 
3], ControlledShutdown(7): 0 to 1 [usable: 1], OffsetCommit(8): 0 to 3 [usable: 
3], OffsetFetch(9): 0 to 3 [usable: 3], FindCoordinator(10): 0 to 1 [usable: 
1], JoinGroup(11): 0 to 2 [usable: 2], Heartbeat(12): 0 to 1 [usable: 1], 
LeaveGroup(13): 0 to 1 [usable: 1], SyncGroup(14): 0 to 1 [usable: 1], 
DescribeGroups(15): 0 to 1 [usable: 1], ListGroups(16): 0 to 1 [usable: 1], 
SaslHandshake(17): 0 to 1 [usable: 0], ApiVersions(18): 0 to 1 [usable: 1], 
CreateTopics(19): 0 to 2 [usable: 2], DeleteTopics(20): 0 to 1 [usable: 1], 
DeleteRecords(21): 0 [usable: 0], InitProducerId(22): 0 [usable: 0], 
OffsetForLeaderEpoch(23): 0 [usable: 0], AddPartitionsToTxn(24): 0 [usable: 0], 
AddOffsetsToTxn(25): 0 [usable: 0], EndTxn(26): 0 [usable: 0], 
WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 [usable: 0], 
DescribeAcls(29): 0 [usable: 0], CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 
0 [usable: 0], DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 
0], UNKNOWN(34): 0, UNKNOWN(35): 0, UNKNOWN(36): 0, UNKNOWN(37): 0) 2018-08-14 
09:17:17 DEBUG Metadata:252 - Updated cluster metadata version 3 to Cluster(id 
= WQCjF0jUSxKreevGTpU06g, nodes = 
[ip-10-106-12-214.eu-west-1.compute.internal:9092 (id: 1 rack: null), 
ip-10-106-12-83.eu-west-1.compute.internal:9092 (id: 0 rack: null), 
ip-10-106-12-219.eu-west-1.compute.internal:9092 (id: 2 rack: null), 
ip-10-106-12-84.eu-west-1.compute.internal:9092 (id: 3 rack: null), 
ip-10-106-12-51.eu-west-1.compute.internal:9092 (id: 4 rack: null)], partitions 
= []) 2018-08-14 09:17:17 DEBUG Metadata:252 - Updated cluster metadata version 
1 to Cluster(id = null, nodes = [kafka.broker.internal:9092 (id: -1 rack: 
null)], partitions = []) 2018-08-14 09:17:17 DEBUG NetworkClient:691 - Recorded 
API versions for node 1: (Produce(0): 0 to 5 [usable: 3], Fetch(1): 0 to 6 
[usable: 5], Offsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 5 [usable: 4], 
LeaderAndIsr(4): 0 to 1 [usable: 0], StopReplica(5): 0 [usable: 0], 
UpdateMetadata(6): 0 to 4 [usable: 3], ControlledShutdown(7): 0 to 1 [usable: 
1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 [usable: 3], 
FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 [usable: 2], 
Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 1], 
SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], 
ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 to 1 [usable: 0], 
ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], 
DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], 
InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], 
AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], 
EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], 
TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], 
CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], 
DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0], 
UNKNOWN(34): 0, UNKNOWN(35): 0, UNKNOWN(36): 0, UNKNOWN(37): 0) 2018-08-14 
09:17:18 DEBUG Metadata:252 - Updated cluster metadata version 1 to Cluster(id 
= null, nodes = [kafka.broker.internal:9092 (id: -1 rack: null)], partitions = 
[])

2018-08-14 09:17:18 DEBUG Metadata:252 - Updated cluster metadata version 1 to 
Cluster(id = null, nodes = [kafka.broker.internal:9092 (id: -1 rack: null)], 
partitions = [])

(that last message repeats until the end of the file)

 

> Kafka Streams application fails to rebalance and is stuck in "Updated cluster 
> metadata version"
> -----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7290
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7290
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.1, 0.10.2.2, 0.11.0.3
>            Reporter: Tim Van Laer
>            Priority: Major
>         Attachments: cg_metadata_failure.txt
>
>
> Our kafka streams application crashed due to a RocksDBException, after that 
> the consumer group basically became unusable. Every consumer in the group 
> went from RUNNING to REBALANCING and was stuck to that state. 
> The application was still on an older version of Kafka Streams (0.10.2.1), 
> but an upgrade of the library didn't got the consumer group back active.
> We tried:
> * adding and removing consumers to the group, no luck, none of the consumers 
> starts processing
> * stopping all consumers and restarted the application, no luck
> * stopping all consumer, reset the consumer group (using the 
> kafka-streams-application-reset tool), no luck
> * replaced the underlying machines, no luck
> * Upgrading our application from Kafka Streams 0.10.2.1 to 0.10.2.2 and 
> 0.11.0.3 after it got stuck, no luck
> We finally got the application back running by changing the applicationId (we 
> could afford to loose the state in this particular case). 
> See attachment for debug logs of the application. The application can reach 
> the Kafka cluster but fails to join the group. 
> The RocksDBException that triggered this state (I lost the container, so 
> unfortunately I don't have more logging):
> {code}
> 2018-08-14 01:40:39 ERROR StreamThread:813 - stream-thread [StreamThread-1] 
> Failed to commit StreamTask 1_1 state:
> org.apache.kafka.streams.errors.ProcessorStateException: task [1_1] Failed to 
> flush state store firehose_subscriptions
>         at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:337)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:72)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:280)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:807)
>  [firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:794)
>  [firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:769)
>  [firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:647)
>  [firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
>  [firechief.jar:?]
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error 
> while executing flush from store firehose_subscriptions
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.flushInternal(RocksDBStore.java:354)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.flush(RocksDBStore.java:345)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore$AbstractWrappedStateStore.flush(WrappedStateStore.java:80)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore$6.run(MeteredKeyValueStore.java:92)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.flush(MeteredKeyValueStore.java:186)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:113)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:335)
>  ~[firechief.jar:?]
>         ... 8 more
> Caused by: org.rocksdb.RocksDBException: _
>         at org.rocksdb.RocksDB.flush(Native Method) ~[firechief.jar:?]
>         at org.rocksdb.RocksDB.flush(RocksDB.java:1642) ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.flushInternal(RocksDBStore.java:352)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.RocksDBStore.flush(RocksDBStore.java:345)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.WrappedStateStore$AbstractWrappedStateStore.flush(WrappedStateStore.java:80)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore$6.run(MeteredKeyValueStore.java:92)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.flush(MeteredKeyValueStore.java:186)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:113)
>  ~[firechief.jar:?]
>         at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:335)
>  ~[firechief.jar:?]
>         ... 8 more
> {code}
> Any ideas on what is wrong or what we can do to workaround this issue? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to