[
https://issues.apache.org/jira/browse/KAFKA-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745816#comment-16745816
]
Dong Lin commented on KAFKA-7836:
---------------------------------
[~junrao] This solution sounds good to me.
> The propagation of log dir failure can be delayed due to slowness in closing
> the file handles
> ---------------------------------------------------------------------------------------------
>
> Key: KAFKA-7836
> URL: https://issues.apache.org/jira/browse/KAFKA-7836
> Project: Kafka
> Issue Type: Improvement
> Reporter: Jun Rao
> Priority: Major
>
> In ReplicaManager.handleLogDirFailure(), we callÂ
> zkClient.propagateLogDirEvent after logManager.handleLogDirFailure. The
> latter closes the file handles of the offline replicas, which could take time
> when the disk is bad. This will delay the new leader election by the
> controller. In one incident, we have seen the closing of file handles of
> multiple replicas taking more than 20 seconds.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)