zhaohaidao opened a new issue, #200: URL: https://github.com/apache/fluss-rust/issues/200
### Search before asking - [x] I searched in the [issues](https://github.com/apache/fluss-rust/issues) and found nothing similar. ### Please describe the bug 🐞 ## Summary `RemoteLogDownloadFuture::on_complete` acquires `completion_callbacks` before `result`, while the completion task acquires `result` before `completion_callbacks`. This lock-order inversion can deadlock when registration and completion race. ## Impact - Remote log downloads can hang permanently. - Read path can stall when waiting for remote segment completion. ## Root Cause Two code paths acquire the same locks in opposite order: - `on_complete`: `completion_callbacks` -> `result` - completion task: `result` -> `completion_callbacks` ## Discussion Notes - The completion task is the `tokio::spawn` in `RemoteLogDownloadFuture::new` that sets `result` and drains `completion_callbacks`. - Today there is only one call site (`scanner.rs`) registering a callback, but the list is meant to allow multiple subscribers and keep Java-like `onComplete` semantics. - `let is_done = self.result.lock().is_some();` releases the lock at the end of the statement. If we rely on that check, we must hold the lock across registration to avoid missing completion. ## Reproduction (Race Scenario) 1. Completion task locks `result` and is about to lock `completion_callbacks`. 2. Another task calls `on_complete`, locks `completion_callbacks`, then tries to lock `result`. 3. Both wait on each other, deadlock. ### Solution ## Fix Options 1. **Consistent lock order (minimal change, recommended).** Make `on_complete` lock `result` first, keep it held while registering, then lock `completion_callbacks`. This matches the completion task order and avoids both deadlock and missed callbacks. 2. **Single combined mutex (simpler correctness, larger refactor).** Replace the two mutexes with one `Mutex<DownloadState { result, callbacks }>`; completion drains callbacks after setting result. ## Recommended Fix Option 1 (consistent lock order) because it is the smallest change and fixes the deadlock without changing API or structure. ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
