[ https://issues.apache.org/jira/browse/KAFKA-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320670#comment-17320670 ]
GeoffreyStark commented on KAFKA-12665: --------------------------------------- the same issue [KAFKA-8714|https://issues.apache.org/jira/browse/KAFKA-8714#] > one of brokers which is also controller has too much CLOSE_WAITE > ---------------------------------------------------------------- > > Key: KAFKA-12665 > URL: https://issues.apache.org/jira/browse/KAFKA-12665 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, controller, core > Affects Versions: 0.11.0.1 > Reporter: GeoffreyStark > Priority: Major > Attachments: image-2021-04-14-10-32-54-140.png, > image-2021-04-14-10-39-02-996.png, image-2021-04-14-11-26-03-346.png > > > # *enviroment* > apache- 0.11.0.1 > 5 nodes > 3 replicator > mean message per sec : 4k > Prometheus & jmxProt & grafana > cosumer : spring boot& Doris routineLoad > producer: spring boo& Log > > # *encounter with* > we encounter with a broker (id : 4)which is also controller (epoch 90)having > much CLOSE_WAITE at a time > controller.log > > {code:java} > Controller 4 epoch 90 fails to send request (type: UpdateMetadataRequest ... > java.io.IOException: Connection to 4 was disconnected before the response was > read > {code} > > !image-2021-04-14-10-32-54-140.png! > It will be retried many, many times, but the WARNING will not change > > At the same time > another broker 6 fetching message from the broker 4 also encounter with the > problem > {code:java} > [2021-04-13 16:35:06,942] WARN [ReplicaFetcherThread-0-4]: Error in fetch to > broker 4, request (type=FetchRequest, replicaId=6, maxWait=500, minBytes=1, > maxBytes=10485760, > java.io.IOException: Connection to 4 was disconnected before the response was > read > {code} > !image-2021-04-14-10-39-02-996.png! > > doris routineLoad(consume from kafka) time out > > {code:java} > 2021-04-13 16:35:11,397 WARN (Routine load scheduler|42) > [KafkaUtil.getAllKafkaPartitions():91] failed to get partitions. > org.apache.doris.common.UserException: errCode = 2, detailMessage = failed to > get kafka partition info: [failed to get partition meta: Local: Timed out] > {code} > > > broker 4( controller 90) fs.file > !image-2021-04-14-11-26-03-346.png! > Most of the CLOSE_WAITE is generated by the consumer application > At 16:49, the broker was restarted and returned to normal > > > *# speculation* > The TCP connection is closed passively, but the processing of the Controller > Broker machine is not responding > Are there any bugs in this version? > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)