platinumhamburg opened a new issue, #2073:
URL: https://github.com/apache/fluss/issues/2073

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   
   ### Fluss version
   
   0.8.0 (latest release)
   
   ### Please describe the bug 🐞
   
   When processing fetch requests in ReplicaManager.readFromLog(), if any 
bucket encounters an error (e.g., `NOT_LEADER_OR_FOLLOWER`, 
`UNKNOWN_TABLE_OR_BUCKET_EXCEPTION`), the current implementation immediately 
short-circuits the entire fetch request.
   This short-circuit behavior bypasses the DelayedFetch mechanism, causing the 
fetch response to be returned immediately. As a result, ReplicaFetcherThread 
receives the response without any delay and retries immediately. During leader 
election or bucket migration, these errors persist temporarily, leading to a 
tight retry loop without any backoff.
   Additionally, in ReplicaFetcherThread, when handling 
`NOT_LEADER_OR_FOLLOWER` error, the replica was not added to replicasWithError, 
preventing proper error tracking and handling.
   
   ### Solution
   
   1. Classify fetch errors into critical and non-critical categories:
   - Non-critical (expected) errors: `NOT_LEADER_OR_FOLLOWER`, 
`UNKNOWN_TABLE_OR_BUCKET_EXCEPTION`
   - Critical errors: all other errors
   
   2. Avoid short-circuiting for non-critical errors:
   - Collect non-critical error buckets separately instead of breaking 
immediately
   - Allow the fetch request to continue processing other buckets and enter the 
DelayedFetch flow normally
   - Merge the error buckets into the delayed response callback
   
   3. Fix error tracking in ReplicaFetcherThread:
   - Add the replica to replicasWithError when `NOT_LEADER_OR_FOLLOWER` error 
occurs
   
   This ensures that even during leader election or bucket migration, fetch 
requests still go through the normal delay mechanism, preventing busy loop 
retry storms.
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to