[ 
https://issues.apache.org/jira/browse/KAFKA-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320670#comment-17320670
 ] 

GeoffreyStark commented on KAFKA-12665:
---------------------------------------

the same issue 

[KAFKA-8714|https://issues.apache.org/jira/browse/KAFKA-8714#]

> one of brokers which is also controller has too much CLOSE_WAITE
> ----------------------------------------------------------------
>
>                 Key: KAFKA-12665
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12665
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer, controller, core
>    Affects Versions: 0.11.0.1
>            Reporter: GeoffreyStark
>            Priority: Major
>         Attachments: image-2021-04-14-10-32-54-140.png, 
> image-2021-04-14-10-39-02-996.png, image-2021-04-14-11-26-03-346.png
>
>
> # *enviroment*
> apache- 0.11.0.1
> 5 nodes
> 3 replicator
> mean message per sec : 4k
> Prometheus & jmxProt & grafana
> cosumer : spring boot& Doris routineLoad
> producer: spring boo& Log 
>  
> # *encounter with*
>  we encounter with a broker (id : 4)which is also controller (epoch 90)having 
> much CLOSE_WAITE  at a time 
> controller.log
>  
> {code:java}
> Controller 4 epoch 90 fails to send request (type: UpdateMetadataRequest ...
> java.io.IOException: Connection to 4 was disconnected before the response was 
> read
> {code}
>  
> !image-2021-04-14-10-32-54-140.png!
> It will be retried many, many times, but the WARNING will not change
>  
> At the same time
> another broker 6  fetching message from the broker 4 also encounter with the 
> problem
> {code:java}
> [2021-04-13 16:35:06,942] WARN [ReplicaFetcherThread-0-4]: Error in fetch to 
> broker 4, request (type=FetchRequest, replicaId=6, maxWait=500, minBytes=1, 
> maxBytes=10485760,
> java.io.IOException: Connection to 4 was disconnected before the response was 
> read
> {code}
> !image-2021-04-14-10-39-02-996.png!
>  
> doris routineLoad(consume from kafka) time out
>  
> {code:java}
> 2021-04-13 16:35:11,397 WARN (Routine load scheduler|42) 
> [KafkaUtil.getAllKafkaPartitions():91] failed to get partitions. 
> org.apache.doris.common.UserException: errCode = 2, detailMessage = failed to 
> get kafka partition info: [failed to get partition meta: Local: Timed out]
> {code}
>  
>  
> broker 4( controller 90) fs.file
> !image-2021-04-14-11-26-03-346.png!
> Most of the CLOSE_WAITE is generated by the consumer application
> At 16:49, the broker was restarted and returned to normal
>  
>  
> *# speculation*
> The TCP connection is closed passively, but the processing of the Controller 
> Broker machine is not responding
> Are there any bugs in this version?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to