[
https://issues.apache.org/jira/browse/HDDS-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siyao Meng updated HDDS-15335:
------------------------------
Description:
NSSummaryTask is a ReconOmTask that the dispatcher fans out on every batch of
OM RocksDB updates Recon ingests.
Inside its process() method, three sub-tasks (FSO / Legacy / OBS) ran
sequentially even though they operate on disjoint slices of the event stream
(filtered by table and bucket layout) and write to disjoint NSSummary entries.
The Legacy and OBS sub-tasks were also each individually slower than necessary
because every event triggered a fresh RocksDB point read of the corresponding
OmBucketInfo from Recon's local OM snapshot DB (via
{{getBucketTable().getSkipCache(...)}}), even though bucket layout and objectID
never change once a bucket exists.
Changes proposed:
1. NSSummaryTaskDbEventHandler caches OmBucketInfo lookups in a
field-level Map keyed by the bucket DB key. Bucket layout/objectID
is immutable for an existing bucket, so an unbounded cache is safe;
cluster bucket count is bounded so memory is not a concern. After
the first event for a given bucket, the cost drops from a RocksDB
point read to a HashMap.get().
2. NSSummaryTask.process() submits each of the three sub-tasks to its
own thread in a 3-thread pool and joins on all three. The threads
do not partition events — all three see the same event list and
each independently iterates it, processing only the events whose
(table, bucket layout) matches its target:
- FSO thread: events on fileTable / dirTable / deletedDirTable.
- Legacy thread: keyTable events whose bucket layout is LEGACY.
- OBS thread: keyTable events whose bucket layout is OBJECT_STORE.
Events that don't match a thread's target are skipped (table-name
check, or bucket-layout check after a now-cached bucket lookup
from change 1). Each sub-task already maintains its own per-call
NSSummary accumulation map and writes to ReconNamespaceSummaryManager
only at flush time via an atomic RDBBatchOperation; the target
NSSummary entries are disjoint between FSO and Legacy/OBS (FSO has
its own namespace tree) and between Legacy and OBS (a bucket has
exactly one layout), so no synchronization is needed across
threads. Per-sub-task seek positions and per-task failure flags
are preserved — same TaskResult contract as before.
3. In the OBS UPDATE path, drop the redundant getKeyParentID(oldKeyInfo)
call. The parent of an OBS key is the bucket, and a key cannot move
between buckets via an UPDATE event (that would be a DELETE+PUT), so
the parent objectID computed for the new key value is identical to
the parent objectID for the old key value.
was:
NSSummaryTask is a ReconOmTask that the dispatcher fans out on every batch of
OM RocksDB updates Recon ingests.
Inside its process() method, three sub-tasks (FSO / Legacy / OBS) ran
sequentially even though they operate on disjoint slices of the event stream
(filtered by table and bucket layout) and write to disjoint NSSummary entries.
The Legacy and OBS sub-tasks were also each individually slower than necessary
because every event triggered a fresh RocksDB point read of the corresponding
OmBucketInfo from Recon's local OM snapshot DB (via
{{getBucketTable().getSkipCache(...)}}), even though bucket layout and objectID
never change once a bucket exists.
> Recon: parallelize NSSummaryTask sub-tasks and cache OmBucketInfo lookups
> -------------------------------------------------------------------------
>
> Key: HDDS-15335
> URL: https://issues.apache.org/jira/browse/HDDS-15335
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Recon
> Reporter: Siyao Meng
> Assignee: Siyao Meng
> Priority: Major
>
> NSSummaryTask is a ReconOmTask that the dispatcher fans out on every batch of
> OM RocksDB updates Recon ingests.
> Inside its process() method, three sub-tasks (FSO / Legacy / OBS) ran
> sequentially even though they operate on disjoint slices of the event stream
> (filtered by table and bucket layout) and write to disjoint NSSummary
> entries. The Legacy and OBS sub-tasks were also each individually slower than
> necessary because every event triggered a fresh RocksDB point read of the
> corresponding OmBucketInfo from Recon's local OM snapshot DB (via
> {{getBucketTable().getSkipCache(...)}}), even though bucket layout and
> objectID never change once a bucket exists.
> Changes proposed:
> 1. NSSummaryTaskDbEventHandler caches OmBucketInfo lookups in a
> field-level Map keyed by the bucket DB key. Bucket layout/objectID
> is immutable for an existing bucket, so an unbounded cache is safe;
> cluster bucket count is bounded so memory is not a concern. After
> the first event for a given bucket, the cost drops from a RocksDB
> point read to a HashMap.get().
> 2. NSSummaryTask.process() submits each of the three sub-tasks to its
> own thread in a 3-thread pool and joins on all three. The threads
> do not partition events — all three see the same event list and
> each independently iterates it, processing only the events whose
> (table, bucket layout) matches its target:
> - FSO thread: events on fileTable / dirTable / deletedDirTable.
> - Legacy thread: keyTable events whose bucket layout is LEGACY.
> - OBS thread: keyTable events whose bucket layout is OBJECT_STORE.
> Events that don't match a thread's target are skipped (table-name
> check, or bucket-layout check after a now-cached bucket lookup
> from change 1). Each sub-task already maintains its own per-call
> NSSummary accumulation map and writes to ReconNamespaceSummaryManager
> only at flush time via an atomic RDBBatchOperation; the target
> NSSummary entries are disjoint between FSO and Legacy/OBS (FSO has
> its own namespace tree) and between Legacy and OBS (a bucket has
> exactly one layout), so no synchronization is needed across
> threads. Per-sub-task seek positions and per-task failure flags
> are preserved — same TaskResult contract as before.
> 3. In the OBS UPDATE path, drop the redundant getKeyParentID(oldKeyInfo)
> call. The parent of an OBS key is the bucket, and a key cannot move
> between buckets via an UPDATE event (that would be a DELETE+PUT), so
> the parent objectID computed for the new key value is identical to
> the parent objectID for the old key value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]