smengcl opened a new pull request, #10321: URL: https://github.com/apache/ozone/pull/10321
This started out as digging what bottlenecks Recon currently has, then turned into focusing on NSSummaryTask and its benchmark. It has been a journey. Generated-by: Claude Code (Opus 4.7 xhigh) ## What changes were proposed in this pull request? `NSSummaryTask.process()` processes every batch of OM update events Recon ingests. On keyTable workloads (LEGACY or OBJECT_STORE bucket layout) it has two avoidable costs: every event triggers a fresh `getBucketTable().getSkipCache(...)` RocksDB point read even though bucket layout and objectID never change; and the three sub-tasks (FSO / Legacy / OBS) iterate the event list sequentially even though they operate on disjoint slices and write to disjoint NSSummary entries. ## Changes 1. **`NSSummaryTaskDbEventHandler` caches `OmBucketInfo` lookups in a field-level Map.** After the first lookup for a bucket, subsequent lookups become `HashMap.get()` calls. 2. **`NSSummaryTask.process()` submits the three sub-tasks to a 3-thread pool and joins on all three.** The threads see the same event list; each only processes events whose `(table, bucket layout)` matches its target. Target NSSummary entries are disjoint across sub-tasks so no cross-thread synchronization is needed, and the `TaskResult` contract is unchanged. 3. **The OBS UPDATE path drops a redundant `getKeyParentID(oldKeyInfo)` call.** The parent of an OBS key is its bucket, and an UPDATE event cannot move a key between buckets. ## Throughput Intel Xeon Silver 4416+ (40 cores / 80 threads), OpenJDK 17, 500k events plus 500k preloaded keys, RATIS replication, mixed 60/30/10 create/update/delete: | Code | events/sec | vs vanilla | | ------------------- | ---------: | ---------: | | Vanilla | 78,098 | 1.00x | | + change 1 (cache) | 672,172 | 8.61x | | + changes 1 and 2 | 918,550 | 11.76x | Change 1 is the dominant lever: it removes about 1.5M `getSkipCache(bucketDBKey)` RocksDB Gets per `process()` call (3 sub-task scans of 500k events, each scan doing one or more bucket lookups before bailing or processing). Change 2 gives a further ~1.37x. Change 3 is below measurement noise. ## Heap pressure Reduced because change 1 stops allocating a transient `OmBucketInfo` per RocksDB Get. At 1M events / 1M preloaded keys with an 8 GB heap, total stop-the-world pause dropped 25% (1137 ms to 850 ms) and cumulative bytes reclaimed dropped 52% (522 GB to 249 GB) across the bench lifetime. ## FSO-heavy workloads On a 100% FSO workload (`fileTable` / `dirTable` / `deletedDirTable`), change 1 is a no-op because the FSO sub-task reads `keyInfo.getParentObjectID()` directly without a bucket lookup. Change 2 still saves the bail-loop cost of Legacy and OBS scanning the event list to skip at the table-name check, but that cost is small relative to FSO's own processing, so the wall-clock speedup on FSO-heavy workloads is correspondingly smaller. The patch is non-regressive in any case. ## Reproduction The reproduction harness (`NSSummaryProcessTimingTest` under `-Pbench`) is provided as a patch on the [JIRA](https://issues.apache.org/jira/browse/HDDS-15335). ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-15335 ## How was this patch tested? - All existing `TestNSSummaryTask*` unit tests pass - Two regression tests are added to `TestNSSummaryTask`: one exercises the OBS sub-task path end-to-end (previously only FSO + Legacy events were sent through `process()`), and one asserts the returned `TaskResult` reports success and contains a seek position for each of `FSO`, `LEGACY`, and `OBS`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
