junrao commented on code in PR #16956:
URL: https://github.com/apache/kafka/pull/16956#discussion_r1729291803
##########
core/src/main/java/kafka/server/share/SharePartition.java:
##########
@@ -881,6 +881,7 @@ private void initialize() {
TopicData<PartitionAllData> state = response.topicsData().get(0);
if (state.topicId() != topicIdPartition.topicId() ||
state.partitions().size() != 1
+ || state.partitions().get(0).errorCode() != Errors.NONE.code()
Review Comment:
The following are the error codes from ReadShareGroupStateResponse. It seems
that we need to treat them differently.
// - NOT_COORDINATOR (version 0+)
// - COORDINATOR_NOT_AVAILABLE (version 0+)
// - COORDINATOR_LOAD_IN_PROGRESS (version 0+)
// - GROUP_ID_NOT_FOUND (version 0+)
// - UNKNOWN_TOPIC_OR_PARTITION (version 0+)
// - FENCED_LEADER_EPOCH (version 0+)
// - INVALID_REQUEST (version 0+)
NOT_COORDINATOR, COORDINATOR_NOT_AVAILABLE, COORDINATOR_LOAD_IN_PROGRESS:
These are transient errors. So we probably want to send a retriable error in
the fetch response. I don't see a suitable error code listed on the KIP.
Perhaps we need to add one?
GROUP_ID_NOT_FOUND: We probably should just return INVALID_GROUP_ID and add
it in the error codes for ShareFetchResponse in the KIP?
UNKNOWN_TOPIC_OR_PARTITION: We should just return the
UNKNOWN_TOPIC_OR_PARTITION error.
INVALID_REQUEST: We should just return INVALID_REQUEST.
FENCED_LEADER_EPOCH: This is also transient. So we need to return a
retriable error. We also need to remove this SharePartition so that a new one
with a new leader epoch could be created. Otherwise, we will be getting this
error forever.
What do you think, @AndrewJSchofield ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]