[jira] [Updated] (KAFKA-14010) alterISR request won't retry when receiving retriable error

Luke Chen (Jira) Sun, 19 Jun 2022 20:05:08 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Luke Chen updated KAFKA-14010:
------------------------------
    Description: 
When submitting the AlterIsr request, we register a future listener to handle 
the response 
[here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/cluster/Partition.scala#L1585-L1610].
 When receiving retriable error, we expected the AlterIsr request will get 
retried. And then, we'll re-submit the request again. 

However, before the future listener got called, we didn't clear the 
`unsentIsrUpdates`, which causes we failed to "enqueue" the request because we 
thought there's an in-flight request. We use "try/finally" to make sure the 
unsentIsrUpdates got cleared 
([here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/AlterPartitionManager.scala#L362-L370]),
 but it happened "after" we retry the request

Although the AlterIsr request will get sent next time when the follower sent 
fetch request to the leader, we still need to fix this issue to make sure the 
AlterIsr request is sent successfully as we expected.

  was:
When submitting the AlterIsr request, we register a future listener to handle 
the response 
[here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/cluster/Partition.scala#L1585-L1610].
 When receiving retriable error, we expected the AlterIsr request will get 
retried. And then, we'll re-submit the request again. 

However, before the future listener got called, we didn't clear the 
`unsentIsrUpdates`, which causes we failed to "enqueue" the request because we 
thought there's an in-flight request. We use "try/finally" to make sure the 
unsentIsrUpdates got cleared 
([here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/AlterPartitionManager.scala#L362-L370],
 but it happened "after" we retry the request

Although the AlterIsr request will get sent next time when the follower sent 
fetch request to the leader, we still need to fix this issue to make sure the 
AlterIsr request is sent successfully as we expected.


> alterISR request won't retry when receiving retriable error
> -----------------------------------------------------------
>
>                 Key: KAFKA-14010
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14010
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.2.0
>            Reporter: Luke Chen
>            Assignee: Luke Chen
>            Priority: Major
>
> When submitting the AlterIsr request, we register a future listener to handle 
> the response 
> [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/cluster/Partition.scala#L1585-L1610].
>  When receiving retriable error, we expected the AlterIsr request will get 
> retried. And then, we'll re-submit the request again. 
> However, before the future listener got called, we didn't clear the 
> `unsentIsrUpdates`, which causes we failed to "enqueue" the request because 
> we thought there's an in-flight request. We use "try/finally" to make sure 
> the unsentIsrUpdates got cleared 
> ([here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/AlterPartitionManager.scala#L362-L370]),
>  but it happened "after" we retry the request
> Although the AlterIsr request will get sent next time when the follower sent 
> fetch request to the leader, we still need to fix this issue to make sure the 
> AlterIsr request is sent successfully as we expected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (KAFKA-14010) alterISR request won't retry when receiving retriable error

Reply via email to