[
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Bukowinski resolved KAFKA-9044.
-------------------------------------
Resolution: Not A Bug
Issue traced to external cause.
> Brokers occasionally (randomly?) dropping out of clusters
> ---------------------------------------------------------
>
> Key: KAFKA-9044
> URL: https://issues.apache.org/jira/browse/KAFKA-9044
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 2.3.0, 2.3.1
> Environment: Ubuntu 14.04
> Reporter: Peter Bukowinski
> Priority: Major
>
> I have several cluster running kafka 2.3.1 and 2.3.0 and this issue has
> affected all of them. Because of replication and the size of the clusters (30
> brokers), this bug is not causing any data loss, but it is nevertheless
> concerning. When a broker drops out, the log gives no indication that there
> are any zookeeper issues (and indeed the zookeepers are healthy when this
> occurs. Here's snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0
> expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52,
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to
> retention time 3600000ms breach (kafka.log.Log)}}
> {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52,
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008]
> for deletion. (kafka.log.Log)}}
> {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52,
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
> {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52,
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
> {{[2019-10-07 11:03:56,957] INFO Deleted log
> /data/3/kl/internal_test-52/00000000000001975332.log.deleted.
> (kafka.log.LogSegment)}}
> {{[2019-10-07 11:03:56,957] INFO Deleted offset index
> /data/3/kl/internal_test-52/00000000000001975332.index.deleted.
> (kafka.log.LogSegment)}}
> {{[2019-10-07 11:03:56,958] INFO Deleted time index
> /data/3/kl/internal_test-52/00000000000001975332.timeindex.deleted.
> (kafka.log.LogSegment)}}
> {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 1 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 13:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 13:52:27,630] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 1 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 14:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 14:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 14:22:27,630] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 1 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 14:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 14:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed
> 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)}}
> {{[2019-10-07 14:46:07,510] INFO [Partition internal_test-33 broker=14]
> Shrinking ISR from 16,17,14 to 14. Leader: (highWatermark: 2007553,
> endOffset: 2007555). Out of sync replicas: (brokerId: 16, endOffset: 2007553)
> (brokerId: 17, endOffset: 2007553). (kafka.cluster.Partition)}}
> {{[2019-10-07 14:46:07,511] INFO [Partition internal_test-33 broker=14]
> Cached zkVersion [17] not equal to that in zookeeper, skip updating ISR
> (kafka.cluster.Partition)}}
> The controller log shows the following:
> {{[2019-10-07 14:45:55,427] INFO [Controller id=24] Newly added brokers: ,
> deleted brokers: 14, bounced brokers: , all live brokers:
> 1,2,3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25
> (kafka.controller.KafkaController)}}
> {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutting
> down (kafka.controller.RequestSendThread)}}
> {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutdown
> completed (kafka.controller.RequestSendThread)}}
> {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Stopped
> (kafka.controller.RequestSendThread)}}
> {{[2019-10-07 14:45:55,481] INFO [Controller id=24] Broker failure callback
> for 14 (kafka.controller.KafkaController)}}
> The clusters use {{zookeeper.session.timeout.ms=45000}}, and
> {{zookeeper.connection.timeout.ms=90000}}.
> I'm unable to find a cause for this behavior. The only thing I can do to
> resolve the issue is to restart the broker.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)