smengcl opened a new pull request, #10321:
URL: https://github.com/apache/ozone/pull/10321

   This started out as digging what bottlenecks Recon currently has, then 
turned into focusing on NSSummaryTask and its benchmark. It has been a journey.
   
   Generated-by: Claude Code (Opus 4.7 xhigh)
   
   ## What changes were proposed in this pull request?
   
   `NSSummaryTask.process()` processes every batch of OM update events Recon 
ingests. On keyTable workloads (LEGACY or OBJECT_STORE bucket layout) it has 
two avoidable costs: every event triggers a fresh 
`getBucketTable().getSkipCache(...)` RocksDB point read even though bucket 
layout and objectID never change; and the three sub-tasks (FSO / Legacy / OBS) 
iterate the event list sequentially even though they operate on disjoint slices 
and write to disjoint NSSummary entries.
   
   ## Changes
   
   1. **`NSSummaryTaskDbEventHandler` caches `OmBucketInfo` lookups in a 
field-level Map.** After the first lookup for a bucket, subsequent lookups 
become `HashMap.get()` calls.
   2. **`NSSummaryTask.process()` submits the three sub-tasks to a 3-thread 
pool and joins on all three.** The threads see the same event list; each only 
processes events whose `(table, bucket layout)` matches its target. Target 
NSSummary entries are disjoint across sub-tasks so no cross-thread 
synchronization is needed, and the `TaskResult` contract is unchanged.
   3. **The OBS UPDATE path drops a redundant `getKeyParentID(oldKeyInfo)` 
call.** The parent of an OBS key is its bucket, and an UPDATE event cannot move 
a key between buckets.
   
   ## Throughput
   
   Intel Xeon Silver 4416+ (40 cores / 80 threads), OpenJDK 17, 500k events 
plus 500k preloaded keys, RATIS replication, mixed 60/30/10 
create/update/delete:
   
   | Code                | events/sec | vs vanilla |
   | ------------------- | ---------: | ---------: |
   | Vanilla             |     78,098 |      1.00x |
   | + change 1 (cache)  |    672,172 |      8.61x |
   | + changes 1 and 2   |    918,550 |     11.76x |
   
   Change 1 is the dominant lever: it removes about 1.5M 
`getSkipCache(bucketDBKey)` RocksDB Gets per `process()` call (3 sub-task scans 
of 500k events, each scan doing one or more bucket lookups before bailing or 
processing). Change 2 gives a further ~1.37x. Change 3 is below measurement 
noise.
   
   ## Heap pressure
   
   Reduced because change 1 stops allocating a transient `OmBucketInfo` per 
RocksDB Get. At 1M events / 1M preloaded keys with an 8 GB heap, total 
stop-the-world pause dropped 25% (1137 ms to 850 ms) and cumulative bytes 
reclaimed dropped 52% (522 GB to 249 GB) across the bench lifetime.
   
   ## FSO-heavy workloads
   
   On a 100% FSO workload (`fileTable` / `dirTable` / `deletedDirTable`), 
change 1 is a no-op because the FSO sub-task reads 
`keyInfo.getParentObjectID()` directly without a bucket lookup. Change 2 still 
saves the bail-loop cost of Legacy and OBS scanning the event list to skip at 
the table-name check, but that cost is small relative to FSO's own processing, 
so the wall-clock speedup on FSO-heavy workloads is correspondingly smaller. 
The patch is non-regressive in any case.
   
   ## Reproduction
   
   The reproduction harness (`NSSummaryProcessTimingTest` under `-Pbench`) is 
provided as a patch on the 
[JIRA](https://issues.apache.org/jira/browse/HDDS-15335).
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-15335
   
   ## How was this patch tested?
   
   - All existing `TestNSSummaryTask*` unit tests pass
   - Two regression tests are added to `TestNSSummaryTask`: one exercises the 
OBS sub-task path end-to-end (previously only FSO + Legacy events were sent 
through `process()`), and one asserts the returned `TaskResult` reports success 
and contains a seek position for each of `FSO`, `LEGACY`, and `OBS`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to