[
https://issues.apache.org/jira/browse/KAFKA-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Colin McCabe resolved KAFKA-15230.
----------------------------------
Fix Version/s: 3.7.0
Resolution: Fixed
> ApiVersions data between controllers is not reliable
> ----------------------------------------------------
>
> Key: KAFKA-15230
> URL: https://issues.apache.org/jira/browse/KAFKA-15230
> Project: Kafka
> Issue Type: Bug
> Reporter: David Arthur
> Assignee: Colin McCabe
> Priority: Critical
> Fix For: 3.7.0
>
>
> While testing ZK migrations, I noticed a case where the controller was not
> starting the migration due to the missing ApiVersions data from other
> controllers. This was unexpected because the quorum was running and the
> followers were replicating the metadata log as expected. After examining a
> heap dump of the leader, it was in fact the case that the ApiVersions map of
> NodeApiVersions was empty.
>
> After further investigation and offline discussion with [~jsancio], we
> realized that after the initial leader election, the connection from the Raft
> leader to the followers will become idle and eventually timeout and close.
> This causes NetworkClient to purge the NodeApiVersions data for the closed
> connections.
>
> There are two main side effects of this behavior:
> 1) If migrations are not started within the idle timeout period (10 minutes,
> by default), then they will not be able to be started. After this timeout
> period, I was unable to restart the controllers in such a way that the leader
> had active connections with all followers.
> 2) Dynamically updating features, such as "metadata.version", is not
> guaranteed to be safe
>
> There is a partial workaround for the migration issue. If we set "
> connections.max.idle.ms" to -1, the Raft leader will never disconnect from
> the followers. However, if a follower restarts, the leader will not
> re-establish a connection.
>
> The feature update issue has no safe workarounds.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)