prashantwason opened a new pull request, #18411: URL: https://github.com/apache/hudi/pull/18411
## Summary - Move table version upgrade (`tryUpgrade()`, `initMetadataTable()`, `restoreEvents()`) from the Pekko dispatcher thread to the coordinator's single-threaded FIFO executor in `StreamWriteOperatorCoordinator.start()`, preventing heartbeat timeout during long-running upgrades - Optimize `SevenToEightUpgradeHandler.upgradeToLSMTimeline()` to use a larger batch size (500 vs default 10) and call `compactAndClean()` once at the end instead of after every batch - Fixes https://github.com/apache/hudi/issues/18410 ## Problem When upgrading a Hudi table with many archived timeline actions (e.g., v7→v8 LSM timeline migration), the upgrade runs synchronously on the Pekko dispatcher thread in `StreamWriteOperatorCoordinator.start()`. Each batch of 10 actions triggers ~5 remote storage operations (parquet write, manifest update, compaction), and with hundreds of actions, the dispatcher thread is blocked for 90+ seconds. This prevents heartbeat responses, causing the ResourceManager to disconnect the JobManager. ## Solution 1. **Threading fix**: Create the `NonThrownExecutor` before the upgrade and submit the heavy initialization as the first FIFO task. Since all event handling also goes through this executor, the upgrade is guaranteed to complete before any events are processed. The Pekko dispatcher thread returns immediately, allowing heartbeats to flow. 2. **I/O optimization**: Use batch size of 500 (vs default 10) and single `compactAndClean()` at end, reducing remote storage operations from ~250 to ~6. ## Test plan - [x] `TestStreamWriteOperatorCoordinator` — all 36 tests pass - [x] Verified `setExecutor()` in test helper calls `executor.close()` which waits for task completion (`waitForTasksFinish=true`), so the upgrade task completes before the mock executor replaces it - [x] Confirmed the fix is needed on apache/master — identical vulnerable code present 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
