I just realized that, based on the experience learned at LinkedIn, some other issues need to be addressed before we commit this patch in Apache Kafka. Here is the history:
1) LinkedIn has very large Kafka cluster and rolling bounce time is too long. So we have hotfix patch as specified in this PR to reduce the time to restart any given broker. 2) The rolling bounce time is significantly reduced, which increases the chance a controller will miss a broker stop/start event. So we have another hotfix patch to fix this scenario. Basically this hotfix patch will ask controller to stop and start a broker if this controller thinks there may be a quick restart. 3) The hotfix patch in 2) make it possible for a broker to receive StopReplicaRequest before the broker receives LeaderAndIsrRequest for all its partitions. With current Kafka implementation, broker will not start ReplicaFetcherThread for the given partition if there is StopReplicaRequest followed by LeaderAndIsrRequest for this partition. In order to address this problem, we have another hotfix patch so that StopReplicaRequest will remove partition `ReplicaManager.allPartitions()`. 4) Due to hotfix patch in 3), if a broker checkpoint highwatermark after receiving StopReplicaRequest, the highwatermark for these partitions will not be included in the checkpoint file and broker will then truncate all these partitions after restart. So we have another hotfix patch as described in KAFKA-6604 to adddres this issue. Note that this issue may also happen if broker receives an outdated LeaderAndIsrRequest that is meant to be delivered before broker restart, as described in KAFKA-6604. So several issues need to be addressed first before we commit this PR to optimize the broker restart time for Apache Kafka users. In particular, we need to fix the issue described in 2) and https://issues.apache.org/jira/browse/KAFKA-7235. [ Full content available at: https://github.com/apache/kafka/pull/5498 ] This message was relayed via gitbox.apache.org for [email protected]
