JunRuiLee commented on PR #7940: URL: https://github.com/apache/paimon/pull/7940#issuecomment-4527296470
Thanks for the detailed review! I've split this PR into 3 parts as suggested: 1. **#7943** — Read-only verification logic (`TableRepair.verify()`) 2. **#7944** — Fix mode + catalog integration (depends on Part 1) 3. **#7945** — CLI command (depends on Part 2) Also addressed the other feedback points: - **Progress logging**: Added `logging.info` every 1000 data files when `check_data_files=True`, and documented time complexity as O(total_data_files) - **Resume-from-failure**: Added per-table error isolation in `repair_database` — individual table failures are logged and skipped, so re-running after a crash continues from where it left off - **Idempotency**: The only fix operation (`_fix_latest_file`) performs a single atomic write. Re-running after interruption converges to valid state. Added docstring explaining the guarantee. - **Test for interrupted mid-fix**: Added `test_repair_is_idempotent` — runs repair twice, verifies second run is a no-op - **Return type annotations**: Added consistent `-> RepairReport` / `-> List[RepairReport]` annotations to catalog methods Please merge in order: Part 1 → Part 2 → Part 3. Closing this PR in favor of the split. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
