[ 
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968861#comment-16968861
 ] 

Jun Rao commented on KAFKA-9044:
--------------------------------

Hmm, I did a quick check and the controller seems to never delete a broker 
registration path in ZK directly (so only the closing or the expiration of the 
ZK session cause the path to go away). Also, normally the controller will first 
create the /controller path in ZK immediately after the session is established. 
Could you double check whether this session indeed comes from the controller 
machine?

> Brokers occasionally (randomly?) dropping out of clusters
> ---------------------------------------------------------
>
>                 Key: KAFKA-9044
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9044
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.3.0, 2.3.1
>         Environment: Ubuntu 14.04
>            Reporter: Peter Bukowinski
>            Priority: Major
>
> I have several cluster running kafka 2.3.1 and this issue has affected all of 
> them. Because of replication and the size of the clusters (30 brokers), this 
> bug is not causing any data loss, but it is nevertheless concerning. When a 
> broker drops out, the log gives no indication that there are any zookeeper 
> issues (and indeed the zookeepers are healthy when this occurs. Here's 
> snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 
> expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to 
> retention time 3600000ms breach (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] 
> for deletion. (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted log 
> /data/3/kl/internal_test-52/00000000000001975332.log.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted offset index 
> /data/3/kl/internal_test-52/00000000000001975332.index.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,958] INFO Deleted time index 
> /data/3/kl/internal_test-52/00000000000001975332.timeindex.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:52:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:22:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 14:46:07,510] INFO [Partition internal_test-33 broker=14] 
> Shrinking ISR from 16,17,14 to 14. Leader: (highWatermark: 2007553, 
> endOffset: 2007555). Out of sync replicas: (brokerId: 16, endOffset: 2007553) 
> (brokerId: 17, endOffset: 2007553). (kafka.cluster.Partition)}}
>  {{[2019-10-07 14:46:07,511] INFO [Partition internal_test-33 broker=14] 
> Cached zkVersion [17] not equal to that in zookeeper, skip updating ISR 
> (kafka.cluster.Partition)}}
> The controller log shows the following:
> {{[2019-10-07 14:45:55,427] INFO [Controller id=24] Newly added brokers: , 
> deleted brokers: 14, bounced brokers: , all live brokers: 
> 1,2,3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25 
> (kafka.controller.KafkaController)}}
> {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutting 
> down (kafka.controller.RequestSendThread)}}
> {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutdown 
> completed (kafka.controller.RequestSendThread)}}
> {{[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Stopped 
> (kafka.controller.RequestSendThread)}}
> {{[2019-10-07 14:45:55,481] INFO [Controller id=24] Broker failure callback 
> for 14 (kafka.controller.KafkaController)}}
> The clusters use {{zookeeper.session.timeout.ms=45000}}, and 
> {{zookeeper.connection.timeout.ms=90000}}.
> I'm unable to find a cause for this behavior. The only thing I can do to 
> resolve the issue is to restart the broker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to