[
https://issues.apache.org/jira/browse/KAFKA-9232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Gustafson resolved KAFKA-9232.
------------------------------------
Fix Version/s: 2.4.1
2.3.2
2.2.3
2.1.2
Resolution: Fixed
> Coordinator new member heartbeat completion does not work for JoinGroup v3
> --------------------------------------------------------------------------
>
> Key: KAFKA-9232
> URL: https://issues.apache.org/jira/browse/KAFKA-9232
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 2.1.1, 2.2.2, 2.3.1
> Reporter: Jason Gustafson
> Assignee: Sophie Blee-Goldman
> Priority: Major
> Fix For: 2.1.2, 2.2.3, 2.3.2, 2.4.1
>
>
> For older versions of the JoinGroup API, the coordinator implements a static
> timeout for new members of 5 minutes. This timeout is implemented using the
> heartbeat purgatory and we expect that the delayed operation will be force
> completed if the member successfully joins. This is implemented in
> GroupCoordinator with the following logic:
> {code:scala}
> group.maybeInvokeJoinCallback(member, joinResult)
> completeAndScheduleNextHeartbeatExpiration(group, member)
> member.isNew = false
> {code}
> However, heartbeat completion depends on this check:
> {code:scala}
> def shouldKeepAlive(deadlineMs: Long): Boolean = {
> if (isAwaitingJoin)
> !isNew || latestHeartbeat + GroupCoordinator.NewMemberJoinTimeoutMs >
> deadlineMs
> else awaitingSyncCallback != null ||
> latestHeartbeat + sessionTimeoutMs > deadlineMs
> }
> {code}
> Since we invoke the join callback first, we will fall to the second branch.
> This will only return true when the latest heartbeat plus session timeout
> exceeds the deadline. The deadline in this case depends only on the
> statically configured new member timeout, which means the heartbeat cannot
> complete until about 5 minutes have passed. If the member falls out of the
> group before then, then the heartbeat ultimately expires, which may trigger a
> spurious rebalance.
> Newer versions of the protocol are not affected by this bug because we return
> immediately the first time a member joins the group.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)