ArafatKhan2198 opened a new pull request, #8798:
URL: https://github.com/apache/ozone/pull/8798

   ## What changes were proposed in this pull request?
   This pull request introduces a parallelized approach to calculating disk 
usage by traversing directory subtrees concurrently using Java’s `ForkJoinPool` 
framework. Instead of sequentially walking through each directory node during 
total size computation, the traversal is split into multiple parallel tasks 
that execute on available CPU cores, significantly reducing the time taken to 
compute disk usage on large and complex directory trees.
   
   ### Approach and Implementation Details:
   Recon’s existing disk usage calculation walks the entire directory tree 
sequentially, resulting in long latency when processing buckets with millions 
of directories. This approach underutilizes multi-core CPUs by using only a 
single thread.
   
   The proposed solution changes the `getTotalSize(objectId)` method to:
   - Fetch first-level child directories of a bucket or directory.
   - Submit each child subtree as an independent task to a `ForkJoinPool`.
   - Each task recursively forks new subtasks for its child directories, 
enabling fine-grained parallelism.
   - Utilize the work-stealing feature of `ForkJoinPool` to dynamically balance 
load across CPU cores, preventing threads from being idle.
   
   RocksDB’s thread-safe read operations enable safe concurrent access without 
additional synchronisation.
   The benefits include:
   - Significant speedup by utilising all available CPU cores.
   - Dynamic load balancing which adapts to uneven directory tree shapes and 
sizes.
   - No changes required to the RocksDB data model or on-disk formats.
   
   Trade-offs:
   - Increased memory usage due to recursive task overhead.
   - Added complexity in managing parallel tasks and error handling.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-13432
   ## How was this patch tested?
   Manually verified the changes, also added new Unit tests 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to