[
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Bukowinski updated KAFKA-9044:
------------------------------------
Description:
I have several cluster running kafka 2.3.1 and this issue has affected all of
them. Because of replication and the size of the clusters (30 brokers), this
bug is not causing any data loss, but it is nevertheless concerning. When a
broker drops out, the log gives no indication that there are any zookeeper
issues (and indeed the zookeepers are healthy when this occurs. Here's snippet
from a broker log when it occurs.
[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52,
dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to
retention time 3600000ms breach (kafka.log.Log)
[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52,
dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] for
deletion. (kafka.log.Log)
[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52,
dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)
[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52,
dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)
[2019-10-07 11:03:56,957] INFO Deleted log
/data/3/kl/internal_test-52/00000000000001975332.log.deleted.
(kafka.log.LogSegment)
[2019-10-07 11:03:56,957] INFO Deleted offset index
/data/3/kl/internal_test-52/00000000000001975332.index.deleted.
(kafka.log.LogSegment)
[2019-10-07 11:03:56,958] INFO Deleted time index
/data/3/kl/internal_test-52/00000000000001975332.timeindex.deleted.
(kafka.log.LogSegment)
[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 1 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:52:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 1 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:22:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 1 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:46:07,510] INFO [Partition internal_test-33 broker=14]
Shrinking ISR from 16,17,14 to 14. Leader: (highWatermark: 2007553, endOffset:
2007555). Out of sync replicas: (brokerId: 16, endOffset: 2007553) (brokerId:
17, endOffset: 2007553). (kafka.cluster.Partition)
[2019-10-07 14:46:07,511] INFO [Partition internal_test-33 broker=14] Cached
zkVersion [17] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)
The controller show the following:
{{[2019-10-07 14:45:55,427] INFO [Controller id=24] Newly added brokers: ,
deleted brokers: 14, bounced brokers: , all live brokers:
1,2,3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25
(kafka.controller.KafkaController)
[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutting
down (kafka.controller.RequestSendThread)
[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutdown
completed (kafka.controller.RequestSendThread)
[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Stopped
(kafka.controller.RequestSendThread)
[2019-10-07 14:45:55,481] INFO [Controller id=24] Broker failure callback for
14 (kafka.controller.KafkaController)}}
The clusters use {{zookeeper.session.timeout.ms=45000}}, and
{{zookeeper.connection.timeout.ms=90000}}.
I'm unable to find a cause for this behavior. The only thing I can do to
resolve the issue is to restart the broker.
was:
I have several cluster running kafka 2.3.1 and this issue has affected all of
them. Because of replication and the size of the clusters (30 brokers), this
bug is not causing any data loss, but it is nevertheless concerning. When a
broker drops out, the log gives no indication that there are any zookeeper
issues (and indeed the zookeepers are healthy when this occurs. Here's snippet
from a broker log when it occurs.
{{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, dir=/data/3/kl]
Found deletable segments with base offsets [1975332] due to retention time
3600000ms breach (kafka.log.Log)
[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, dir=/data/3/kl]
Scheduling log segment [baseOffset 1975332, size 92076008] for deletion.
(kafka.log.Log)
[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, dir=/data/3/kl]
Incrementing log start offset to 2000317 (kafka.log.Log)
[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, dir=/data/3/kl]
Deleting segment 1975332 (kafka.log.Log)
[2019-10-07 11:03:56,957] INFO Deleted log
/data/3/kl/internal_test-52/00000000000001975332.log.deleted.
(kafka.log.LogSegment)
[2019-10-07 11:03:56,957] INFO Deleted offset index
/data/3/kl/internal_test-52/00000000000001975332.index.deleted.
(kafka.log.LogSegment)
[2019-10-07 11:03:56,958] INFO Deleted time index
/data/3/kl/internal_test-52/00000000000001975332.timeindex.deleted.
(kafka.log.LogSegment)
[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 1 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:52:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 1 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:22:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 1 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:46:07,510] INFO [Partition internal_test-33 broker=14] Shrinking
ISR from 16,17,14 to 14. Leader: (highWatermark: 2007553, endOffset: 2007555).
Out of sync replicas: (brokerId: 16, endOffset: 2007553) (brokerId: 17,
endOffset: 2007553). (kafka.cluster.Partition)
[2019-10-07 14:46:07,511] INFO [Partition internal_test-33 broker=14] Cached
zkVersion [17] not equal to that in zookeeper, skip updating ISR
(kafka.cluster.Partition)}}
The controller show the following:
{{[2019-10-07 14:45:55,427] INFO [Controller id=24] Newly added brokers: ,
deleted brokers: 14, bounced brokers: , all live brokers:
1,2,3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25
(kafka.controller.KafkaController)
[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutting
down (kafka.controller.RequestSendThread)
[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutdown
completed (kafka.controller.RequestSendThread)
[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Stopped
(kafka.controller.RequestSendThread)
[2019-10-07 14:45:55,481] INFO [Controller id=24] Broker failure callback for
14 (kafka.controller.KafkaController)}}
The clusters use {{zookeeper.session.timeout.ms=45000}}, and
{{zookeeper.connection.timeout.ms=90000}}.
I'm unable to find a cause for this behavior. The only thing I can do to
resolve the issue is to restart the broker.
> Brokers occasionally (randomly?) dropping out of clusters
> ---------------------------------------------------------
>
> Key: KAFKA-9044
> URL: https://issues.apache.org/jira/browse/KAFKA-9044
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 2.3.1
> Environment: Ubuntu 14.04
> Reporter: Peter Bukowinski
> Priority: Major
>
> I have several cluster running kafka 2.3.1 and this issue has affected all of
> them. Because of replication and the size of the clusters (30 brokers), this
> bug is not causing any data loss, but it is nevertheless concerning. When a
> broker drops out, the log gives no indication that there are any zookeeper
> issues (and indeed the zookeepers are healthy when this occurs. Here's
> snippet from a broker log when it occurs.
>
> [2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52,
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to
> retention time 3600000ms breach (kafka.log.Log)
> [2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52,
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008]
> for deletion. (kafka.log.Log)
> [2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52,
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)
> [2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52,
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)
> [2019-10-07 11:03:56,957] INFO Deleted log
> /data/3/kl/internal_test-52/00000000000001975332.log.deleted.
> (kafka.log.LogSegment)
> [2019-10-07 11:03:56,957] INFO Deleted offset index
> /data/3/kl/internal_test-52/00000000000001975332.index.deleted.
> (kafka.log.LogSegment)
> [2019-10-07 11:03:56,958] INFO Deleted time index
> /data/3/kl/internal_test-52/00000000000001975332.timeindex.deleted.
> (kafka.log.LogSegment)
> [2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 1 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 13:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 13:52:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 1 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 14:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 14:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 14:22:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 1 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 14:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 14:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-10-07 14:46:07,510] INFO [Partition internal_test-33 broker=14]
> Shrinking ISR from 16,17,14 to 14. Leader: (highWatermark: 2007553,
> endOffset: 2007555). Out of sync replicas: (brokerId: 16, endOffset: 2007553)
> (brokerId: 17, endOffset: 2007553). (kafka.cluster.Partition)
> [2019-10-07 14:46:07,511] INFO [Partition internal_test-33 broker=14] Cached
> zkVersion [17] not equal to that in zookeeper, skip updating ISR
> (kafka.cluster.Partition)
>
> The controller show the following:
>
> {{[2019-10-07 14:45:55,427] INFO [Controller id=24] Newly added brokers: ,
> deleted brokers: 14, bounced brokers: , all live brokers:
> 1,2,3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25
> (kafka.controller.KafkaController)
> [2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutting
> down (kafka.controller.RequestSendThread)
> [2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutdown
> completed (kafka.controller.RequestSendThread)
> [2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Stopped
> (kafka.controller.RequestSendThread)
> [2019-10-07 14:45:55,481] INFO [Controller id=24] Broker failure callback
> for 14 (kafka.controller.KafkaController)}}
>
> The clusters use {{zookeeper.session.timeout.ms=45000}}, and
> {{zookeeper.connection.timeout.ms=90000}}.
> I'm unable to find a cause for this behavior. The only thing I can do to
> resolve the issue is to restart the broker.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)