Jun Rao created KAFKA-7836:
------------------------------

             Summary: The propagation of log dir failure can be delayed due to 
slowness in closing the file handles
                 Key: KAFKA-7836
                 URL: https://issues.apache.org/jira/browse/KAFKA-7836
             Project: Kafka
          Issue Type: Improvement
            Reporter: Jun Rao


In ReplicaManager.handleLogDirFailure(), we call zkClient.propagateLogDirEvent 
after  logManager.handleLogDirFailure. The latter closes the file handles of 
the offline replicas, which could take time when the disk is bad. This will 
delay the new leader election by the controller. In one incident, we have seen 
the closing of file handles of multiple replicas taking more than 20 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to