navinko opened a new pull request, #9624:
URL: https://github.com/apache/ozone/pull/9624
## What changes were proposed in this pull request?
Avoid collecting keys in memory during parallel OM table processing.
Please describe your PR in detail:
- The new implementation keeps the iterator thread pool but removes the
value-executor pool and in-memory batching.
- Each table iterator is now owned by a single worker thread and scans only
its assigned key range.
- Each table iterator now runs on single thread and validated it with works
as it is with ByteArrayCode .
- There will be another PR for replacing ByteArrayCodec with
CodecBufferCodec under ParallelTableOperation
https://issues.apache.org/jira/browse/HDDS-14155
- Added unit test case for validating new flow .
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14400
## How was this patch tested?
CI:
https://github.com/navinko/ozone/actions/runs/20884674236
Validated with junit test and tested the flow by populating data to
fileTable and validated the parallel processing for individual table in debug
mode and normal.
bash-5.1$ ozone debug ldb --db=/data/metadata/om.db scan
--column_family=fileTable --count
23916
<img width="1704" height="980" alt="Screenshot 2026-01-10 at 5 46 55 PM"
src="https://github.com/user-attachments/assets/e0b25ab5-6034-4b62-8bb9-1db47017cbfc"
/>
# Recon log
Run mode with 21627 key uploaded to fileTable followed by reprocessing.
> 2026-01-10T15:42:37.870472510Z 2026-01-10 15:42:37,870 [ReconTaskThread-0]
INFO tasks.ReconTaskControllerImpl: Task OmTableInsightTask started execution
on thread ReconTaskThread-0
2026-01-10T15:42:37.870724094Z 2026-01-10 15:42:37,870 [ReconTaskThread-0]
INFO tasks.OmTableInsightTask: OmTableInsightTask: Starting reprocess
2026-01-10T15:42:37.878627094Z 2026-01-10 15:42:37,878 [ReconTaskThread-0]
INFO tasks.OmTableInsightTask: OmTableInsightTask: Processing table dTokenTable
sequentially (non-String keys)
2026-01-10T15:42:37.888022094Z 2026-01-10 15:42:37,887 [ReconTaskThread-0]
INFO util.ParallelTableIteratorOperation: OmTableInsightTask: Parallel
iteration completed - Total keys processed: 2
2026-01-10T15:42:37.888184677Z 2026-01-10 15:42:37,888 [ReconTaskThread-0]
INFO tasks.OmTableInsightTask: OmTableInsightTask: Processing table
s3SecretTable sequentially (non-String keys)
2026-01-10T15:42:37.899993135Z 2026-01-10 15:42:37,899 [ReconTaskThread-0]
INFO util.ParallelTableIteratorOperation: OmTableInsightTask: Parallel
iteration completed - Total keys processed: 3
2026-01-10T15:42:37.944590219Z 2026-01-10 15:42:37,944 [ReconTaskThread-0]
INFO util.ParallelTableIteratorOperation: OmTableInsightTask: Parallel
iteration completed - Total keys processed: 21627
2026-01-10T15:42:37.947238802Z 2026-01-10 15:42:37,947 [ReconTaskThread-0]
INFO tasks.OmTableInsightTask: OmTableInsightTask: Reprocess completed in 76 ms
2026-01-10T15:42:37.947249094Z 2026-01-10 15:42:37,947 [ReconTaskThread-0]
INFO tasks.ReconTaskControllerImpl: Task OmTableInsightTask completed execution
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]