dajac commented on code in PR #12181:
URL: https://github.com/apache/kafka/pull/12181#discussion_r880242265


##########
core/src/main/scala/kafka/cluster/Partition.scala:
##########
@@ -1574,6 +1586,13 @@ class Partition(val topicPartition: TopicPartition,
         debug(s"Failed to alter partition to $proposedIsrState since there is 
a pending AlterPartition still inflight. " +
           s"partition state has been reset to the latest committed state 
$partitionState")
         false
+      case Errors.INELIGIBLE_REPLICA =>

Review Comment:
   Yeah, that's a good point. I thought that having a backoff is not really 
necessary given that we don't retry immediately but only when the next fetch 
request for that replica is processed. The downside of adding a backoff here is 
that it could impact shrinking/expanding the ISR for other replicas as well.
   
   I considered returning the ineligible replicas in the response. That would 
allow the leader to cache them locally and avoid retrying until ineligible 
replicas are removed. I was concerned by this logic because we have to maintain 
that set based on the metadata log and the AlterPartition response. That seems 
error prone.
   
   Having an `ineligileReplicas` field tight to the partition state would be 
much better in my opinion but that comes with the price of writing more 
records. In this case, we could cache the ineligible replicas returned in the 
response because we have the guarantee that it would be updated by the new 
partition state based on the partition epoch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to