Hello Kafka Dev,
We need help on lagging issue we are seeing on one of the environment which
doesn’t have much load. We are running kafka on multiple environement, and on
one of our environemnt we do see events are taking huge time (some time more
then a day) to get process from kafka. The topic have two partition, 3
replicase and two consumers are running on it (So one to one mapping between
partition and consumer). When i run kafka-consumer-group.sh to find the stats,
i can see lag on one of the consumer and then lag move to another consumer
after some time, and they keep switching with time and increase time to process
events. So look to me rebalancing is happening but at the same time consumer-id
is same so consumer not getting started in between. We also tried to restart
and kafka and zookeeper but end result is same, here is the detail.
[2018-10-12 03:52:21,676] WARN Removing server circle2-kafka2:909 from
bootstrap.servers as DNS resolution failed for circle2-kafka2
(org.apache.kafka.clients.ClientUtils)
group-es
group-rds
[vikas@circle1-kafka1 kafka]$ ./bin/kafka-consumer-groups.sh --bootstrap-server
circle1-kafka1:9092,circle2-kafka2:9092, circle1-kafka3 -describe -group
group-rds
Note: This will not show information about old Zookeeper-based consumers.
[2018-10-12 03:53:06,226] WARN Removing server circle2-kafka2:9092 from
bootstrap.servers as DNS resolution failed for circle2-kafka2
(org.apache.kafka.clients.ClientUtils)
[2018-10-12 03:53:06,436] WARN Removing server circle2-kafka2:9092 from
bootstrap.servers as DNS resolution failed for circle2-kafka2
(org.apache.kafka.clients.ClientUtils)
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
CONSUMER-ID
HOST CLIENT-ID
topic.events 1 45471 45471 0
data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds-dc1cb0e1-48fb-40c5-bd96-0e9980e1083d
/172.27.4.133 data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds
topic.events 0 344987 346323 1336
data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds-3a13af04-048f-40b4-9b09-b74a9600dfd8
/172.27.4.133 data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds
[vikas@circle1-kafka1 kafka]$ ./bin/kafka-consumer-groups.sh --bootstrap-server
circle1-kafka1:9092,circle2-kafka2:9092,circle1-kafka3 -describe -group
group-rds
Note: This will not show information about old Zookeeper-based consumers.
[2018-10-12 04:04:29,725] WARN Removing server circle2-kafka2:9092 from
bootstrap.servers as DNS resolution failed for circle2-kafka2
(org.apache.kafka.clients.ClientUtils)
[2018-10-12 04:04:29,926] WARN Removing server circle2-kafka2:9092 from
bootstrap.servers as DNS resolution failed for circle2-kafka2
(org.apache.kafka.clients.ClientUtils)
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
CONSUMER-ID
HOST CLIENT-ID
topic.events 1 44873 45471 598
data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds-dc1cb0e1-48fb-40c5-bd96-0e9980e1083d
/172.27.4.133 data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds
topic.events 0 346324 346324 0
data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds-3a13af04-048f-40b4-9b09-b74a9600dfd8
/172.27.4.133 data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds
Here is the info of kafka env
1)Version -> kafka_2.11-1.1.0
2)Zookeeper setting -> Default
3)kafka setting -> Most of the settings are default, here are few specific
changes we have done
zookeeper.connection.timeout.ms=6000
#Setting the replication for nodes under the default of 3
default.replication.factor=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
config.storage.replication.factor=3
offset.storage.replication.factor=3
status.storage.replication.factor=3
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.retention.hours=24
Please do let me know in case you need more detail from my end.
Your quick help is much appreciated, in case you are not able to help or i am
at wrong group then please point me at right group.
Regards,
Vikas