chihsuan opened a new pull request, #10578: URL: https://github.com/apache/ozone/pull/10578
## What changes were proposed in this pull request? **Problem.** When `FSORepairTool` marks the temporary `reachable` and `pendingToDelete` tables in `temp.db`, it writes each entry with an individual `Table.put`. Every put is a separate RocksDB write (WAL + fsync). For FSO buckets with thousands or millions of files and directories, this per-entry fsync overhead dominates the run. **Fix.** Accumulate those temp-table writes in a bounded RocksDB `BatchOperation` and commit them in batches. A small `BatchedTempWriter` helper buffers `putWithBatch` calls, flushes (commit + reopen) every `tempDbBatchSize` entries to cap memory, and commits any remainder on close. Each marking phase wraps its directory walk in one writer. This is safe because the two temp tables are only written during the marking phases and only read back later in the classification phase, so all writes for a bucket are committed before that bucket is classified. The repair-mode logic that moves entries to the OM deleted tables was already batched and is unchanged. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-14187 ## How was this patch tested? - The existing `TestFSORepairTool` suite (connected / disconnected / empty / unreachable trees, dry-run, volume and bucket filters, repair mode, and post-repair OM restart validation) passes unchanged, confirming the batched writes produce identical reports. - Added `testBatchedTempWrites`, which sets `tempDbBatchSize = 1` and runs a full dry-run so the batch commit/reset path is exercised for both the `reachable` and `pendingToDelete` tables across all tree shapes, and asserts the report is identical to the default-batch run. - `checkstyle.sh` is clean on the changed modules. Generated-by: Claude Code (claude-opus-4-8) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
