[jira] [Updated] (KAFKA-15950) Serialize broker heartbeat requests

Jun Rao (Jira) Mon, 11 Dec 2023 09:21:04 -0800


     [ 
https://issues.apache.org/jira/browse/KAFKA-15950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jun Rao updated KAFKA-15950:
----------------------------
    Description: 
{{KafkaEventQueue}} does de-duping and only allows one outstanding 
{{CommunicationEvent}} in the queue. But it seems that duplicated 
{{{}HeartbeatRequest{}}}s could still be generated. {{CommunicationEvent}} 
calls {{sendBrokerHeartbeat}} that calls the following.

{{}}
{code:java}
_channelManager.sendRequest(new BrokerHeartbeatRequest.Builder(data), 
handler){code}
{{}}

The problem is that we have another queue in 
{{NodeToControllerChannelManagerImpl}} that doesn't do the de-duping. Once a 
{{CommunicationEvent}} is dequeued from {{{}KafkaEventQueue{}}}, a 
{{HeartbeatRequest}} will be queued in 
{{{}NodeToControllerChannelManagerImpl{}}}. At this point, another 
{{CommunicationEvent}} could be enqueued in {{{}KafkaEventQueue{}}}. When it's 
processed, another {{HeartbeatRequest}} will be queued in 
{{{}NodeToControllerChannelManagerImpl{}}}.

This probably won't introduce long lasting duplicated {{HeartbeatRequest}} in 
practice since {{CommunicationEvent}} is typically queued in 
{{KafkaEventQueue}} for heartbeat interval. By that time, other pending 
{{{}HeartbeatRequest{}}}s will be processed and de-duped when enqueuing to 
{{{}KafkaEventQueue{}}}. However, duplicated requests could make it hard to 
reason about tests.

  was:
Currently, CommunicationEvent is scheduled with DeadlineFunction, which ignores 
the schedule time for an existing event. This wasn't an issue when 
CommunicationEvent is always periodic. However, with KAFKA-15360,  a 
CommunicationEvent could be scheduled immediately for offline dirs. If a 
periodic CommunicationEvent is scheduled after the immediate CommunicationEvent 
in KafkaEventQueue, the former will cancel the latter, but leaves the schedule 
time to be periodic. This will unnecessarily delay the communication of the 
failed dir to the controller. 
 
Using EarliestDeadlineFunction will fix this issue.


> Serialize broker heartbeat requests
> -----------------------------------
>
>                 Key: KAFKA-15950
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15950
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.7.0
>            Reporter: Jun Rao
>            Assignee: Igor Soarez
>            Priority: Major
>
> {{KafkaEventQueue}} does de-duping and only allows one outstanding 
> {{CommunicationEvent}} in the queue. But it seems that duplicated 
> {{{}HeartbeatRequest{}}}s could still be generated. {{CommunicationEvent}} 
> calls {{sendBrokerHeartbeat}} that calls the following.
> {{}}
> {code:java}
> _channelManager.sendRequest(new BrokerHeartbeatRequest.Builder(data), 
> handler){code}
> {{}}
> The problem is that we have another queue in 
> {{NodeToControllerChannelManagerImpl}} that doesn't do the de-duping. Once a 
> {{CommunicationEvent}} is dequeued from {{{}KafkaEventQueue{}}}, a 
> {{HeartbeatRequest}} will be queued in 
> {{{}NodeToControllerChannelManagerImpl{}}}. At this point, another 
> {{CommunicationEvent}} could be enqueued in {{{}KafkaEventQueue{}}}. When 
> it's processed, another {{HeartbeatRequest}} will be queued in 
> {{{}NodeToControllerChannelManagerImpl{}}}.
> This probably won't introduce long lasting duplicated {{HeartbeatRequest}} in 
> practice since {{CommunicationEvent}} is typically queued in 
> {{KafkaEventQueue}} for heartbeat interval. By that time, other pending 
> {{{}HeartbeatRequest{}}}s will be processed and de-duped when enqueuing to 
> {{{}KafkaEventQueue{}}}. However, duplicated requests could make it hard to 
> reason about tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-15950) Serialize broker heartbeat requests

Reply via email to