loserwang1024 opened a new issue, #1751:
URL: https://github.com/apache/fluss/issues/1751

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   
   ### Fluss version
   
   0.7.0 (latest release)
   
   ### Please describe the bug 🐞
   
   In the original code, the download lock (prefetchSemaphore) is released only 
in two cases:
   
   1. After a RemoteLogSegment has been successfully read (drained), the lock 
is released via recycleRemoteLog.
   
   <img width="548" height="211" alt="Image" 
src="https://github.com/user-attachments/assets/9946e354-be3d-43c2-bbd6-650853319e0b";
 />
   
   2. When the download of a file fails, the lock is released.
   
   <img width="664" height="384" alt="Image" 
src="https://github.com/user-attachments/assets/816582cc-7f30-4c45-97f5-7b800b911252";
 />
   
   Let us simplify the model: suppose a bucket contains three segment files — 
A, B, and C — and client.scanner.remote-log.prefetch-num = 1.
   1. File A fails to download, so the lock is released. File A is then added 
back to the end of the queue.
   2. File B downloads successfully, but the lock is not immediately released 
because it hasn't been drained.
   Since file A has an earlier offset, it remains at the front of the queue and 
must be processed first. However, file B holds the prefetch lock, and file A 
cannot be reattempted until the lock is acquired again. But because B will 
never be drained (as A blocks its processing), the lock is never released — 
resulting in a deadlock.
   3.  file C will not be downloaded, and file A will never be retried. The 
entire job becomes stuck.
   
   ### Solution
   
   When RemoteLogDownloader failed to download a file, no longer release the 
semaphore but retied to download for several times. If still failed to 
download, but thrown the exception out of client, let the flink job fails and 
restarts.
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to