navinko opened a new pull request, #9624:
URL: https://github.com/apache/ozone/pull/9624

   ## What changes were proposed in this pull request?
   Avoid collecting keys in memory during parallel OM table processing.
   
   Please describe your PR in detail:
   - The new implementation keeps the iterator thread pool but removes the 
value-executor pool and in-memory batching. 
   - Each table iterator is now owned by a single worker thread and scans only 
its assigned key range.
   - Each table iterator now runs on single thread and validated it with works 
as it is with ByteArrayCode .
   - There will be another PR for replacing ByteArrayCodec with 
CodecBufferCodec under ParallelTableOperation 
       https://issues.apache.org/jira/browse/HDDS-14155
   - Added unit test case for validating new flow .
   
   ## What is the link to the Apache JIRA
     https://issues.apache.org/jira/browse/HDDS-14400
   
   ## How was this patch tested?
   CI:
   https://github.com/navinko/ozone/actions/runs/20884674236
   Validated with junit test and tested the flow by populating data to 
fileTable and validated the parallel processing for individual table in debug 
mode and normal.
   bash-5.1$ ozone debug ldb --db=/data/metadata/om.db scan 
--column_family=fileTable --count
   23916
   
   <img width="1704" height="980" alt="Screenshot 2026-01-10 at 5 46 55 PM" 
src="https://github.com/user-attachments/assets/e0b25ab5-6034-4b62-8bb9-1db47017cbfc";
 />
   
   # Recon log
        Run mode with 21627 key uploaded to fileTable followed by reprocessing.
   > 2026-01-10T15:42:37.870472510Z 2026-01-10 15:42:37,870 [ReconTaskThread-0] 
INFO tasks.ReconTaskControllerImpl: Task OmTableInsightTask started execution 
on thread ReconTaskThread-0
   2026-01-10T15:42:37.870724094Z 2026-01-10 15:42:37,870 [ReconTaskThread-0] 
INFO tasks.OmTableInsightTask: OmTableInsightTask: Starting reprocess
   2026-01-10T15:42:37.878627094Z 2026-01-10 15:42:37,878 [ReconTaskThread-0] 
INFO tasks.OmTableInsightTask: OmTableInsightTask: Processing table dTokenTable 
sequentially (non-String keys)
   2026-01-10T15:42:37.888022094Z 2026-01-10 15:42:37,887 [ReconTaskThread-0] 
INFO util.ParallelTableIteratorOperation: OmTableInsightTask: Parallel 
iteration completed - Total keys processed: 2
   2026-01-10T15:42:37.888184677Z 2026-01-10 15:42:37,888 [ReconTaskThread-0] 
INFO tasks.OmTableInsightTask: OmTableInsightTask: Processing table 
s3SecretTable sequentially (non-String keys)
   2026-01-10T15:42:37.899993135Z 2026-01-10 15:42:37,899 [ReconTaskThread-0] 
INFO util.ParallelTableIteratorOperation: OmTableInsightTask: Parallel 
iteration completed - Total keys processed: 3
   2026-01-10T15:42:37.944590219Z 2026-01-10 15:42:37,944 [ReconTaskThread-0] 
INFO util.ParallelTableIteratorOperation: OmTableInsightTask: Parallel 
iteration completed - Total keys processed: 21627
   2026-01-10T15:42:37.947238802Z 2026-01-10 15:42:37,947 [ReconTaskThread-0] 
INFO tasks.OmTableInsightTask: OmTableInsightTask: Reprocess completed in 76 ms
   2026-01-10T15:42:37.947249094Z 2026-01-10 15:42:37,947 [ReconTaskThread-0] 
INFO tasks.ReconTaskControllerImpl: Task OmTableInsightTask completed execution


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to