Re: [PR] KAFKA-16999: [Minor] Check partition response error code (KIP-932) [kafka]

via GitHub Tue, 27 Aug 2024 08:44:46 -0700


AndrewJSchofield commented on code in PR #16956:
URL: https://github.com/apache/kafka/pull/16956#discussion_r1733125093



##########
core/src/main/java/kafka/server/share/SharePartition.java:
##########
@@ -881,6 +881,7 @@ private void initialize() {
 
         TopicData<PartitionAllData> state = response.topicsData().get(0);
         if (state.topicId() != topicIdPartition.topicId() || 
state.partitions().size() != 1
+            || state.partitions().get(0).errorCode() != Errors.NONE.code()

Review Comment:
   @junrao I discussed with @apoorvmittal10 and @smjn.
   
   NOT_COORDINATOR, COORDINATOR_NOT_AVAILABLE and COORDINATOR_LOAD_IN_PROGRESS 
are transient errors that are intended to be handled automatically by the 
persister which makes the calls to the ReadShareGroupState RPC. The persister 
will retry the RPCs with exponential back-off and we do not expect these error 
codes to surface to the SharePartition. The retries do have a limit after 
exponential back-off has exhausted, and they'll be returned as 
COORDINATOR_NOT_AVAILABLE to the client with an appropriate error message. This 
one needs adding to the KIP.
   
   GROUP_ID_NOT_FOUND and UNKNOWN_TOPIC_OR_PARTITION mean that the share 
coordinator could not find the state records for these resources. This should 
never happen because the group coordinator will call the share coordinator to 
initialize the records for all share-partitions which can be assigned in share 
groups, thus meaning that the share coordinator will be able to find the state 
records for all partitions that can be assigned. These errors indicate a 
mismatch between the share-group state topic and what the client is requesting, 
and will be treated as a badly behaving client fetch data for share-partitions 
that it was never assigned. The error code will be INVALID_REQUEST.
   
   FENCED_LEADER_EPOCH will cause the share-partition leader to recheck the 
leader epoch for the partition. If the leader epoch has changed underneath the 
share-partition leader, it needs to discard the share partition and retry, so 
this one is NOT_LEADER_OR_FOLLOWER. If it finds that the leader epoch has not 
changed but it's still being fenced, this is UNKNOWN_SERVER_ERROR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-16999: [Minor] Check partition response error code (KIP-932) [kafka]

Reply via email to