jsancio commented on a change in pull request #9553:
URL: https://github.com/apache/kafka/pull/9553#discussion_r547556738
##########
File path: raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java
##########
@@ -1037,6 +1047,35 @@ private boolean handleFetchResponse(
logger.info("Truncated to offset {} from Fetch response
from leader {}",
truncationOffset, quorum.leaderIdOrNil());
});
+ } else if (partitionResponse.snapshotId().epoch() >= 0 ||
+ partitionResponse.snapshotId().endOffset() >= 0) {
+ // The leader is asking us to fetch a snapshot
+
+ if (partitionResponse.snapshotId().epoch() < 0) {
+ throw new KafkaException(
Review comment:
> I would suggest that we log an error saying that the remote replica
seemed to return an invalid response and just keep fetching. Then a user can
see the log message and restart the remote replica.
Yeah. This is what I implemented and added a test for it. In other words.
1. Log an error message
2. Tell the raft client that the response was handle successfully but the
fetch timer was not reset
In practice this results in the follower continuing to send `Fetch`
requests. After `fetchTimeoutMs` the follower will transition to candidate as
the existing client code does. See
https://github.com/apache/kafka/pull/9553/files#diff-86474ad1438150630c21b29a3da2f6dd79d1357e33ac034f00e5fcef0f2e889cR350
Let me know if this is what you were thinking.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]