David Arthur created KAFKA-12686:
------------------------------------

             Summary: Race condition in AlterIsr response handling
                 Key: KAFKA-12686
                 URL: https://issues.apache.org/jira/browse/KAFKA-12686
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 2.8.0, 2.7.0
            Reporter: David Arthur
            Assignee: David Arthur
             Fix For: 3.0.0


In Partition.scala, there is a race condition between the handling of an 
AlterIsrResponse and a LeaderAndIsrRequest. This is a pretty rare scenario and 
would involve the AlterIsrResponse being delayed for some time, but it is 
possible. This was observed in a test environment when lots of ISR and 
leadership changes were happening due to broker restarts.

When the leader handles the LeaderAndIsr, it calls Partition#makeLeader which 
overrides the {{isrState}} variable and clears the pending ISR items via 
{{AlterIsrManager#clearPending(TopicPartition)}}. 

The bug is that AlterIsrManager does not check its inflight state before 
clearing pending items. The way AlterIsrManager is designed, it retains 
inflight items in the pending items collection until the response is processed 
(to allow for retries). The result is that an inflight item is inadvertently 
removed from this collection.

Since the inflight item is cleared from the collection, AlterIsrManager allows 
for new AlterIsrItem-s to be enqueued for this partition even though it has an 
inflight AlterIsrItem. By allowing an update to be enqueued, Partition will 
transition its {{isrState}} to one of the inflight states (PendingIsrExpand, 
PendingIsrShrink, etc). Once the inflight partition's response is handled, it 
will fail to update the {{isrState}} due to detecting changes since the request 
was sent (which is by design). However, after the response callback is run, 
AlterIsrManager will clear the partitions that it saw in the response from the 
unsent items collection. This includes the newly added (and unsent) update.

The result is that Partition has a "inflight" isrState but AlterIsrManager does 
not have an unsent item for this partition. This prevents any further ISR 
updates on the partition until the next leader election (when {{isrState}} is 
reset).

If this bug is encountered, the workaround is to force a leader election which 
will reset the partition's state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to