ArafatKhan2198 commented on PR #6318:
URL: https://github.com/apache/ozone/pull/6318#issuecomment-2019757417

   > > @devmadhuu @adoroszlai @smitajoshi12
   > > Could you please review the latest changes? Here's a quick summary:
   > > ```
   > > * Switched to Parallel Sorting: To improve performance, we're now using 
parallel sorting. More details are in the description.
   > > 
   > > * Added a Toggle for Sorting: There's a new boolean flag to turn sorting 
on or off.
   > > 
   > > * Set a Limit of 30 Records: We've added a constant to limit the 
response to the top 30 records in Disk Usage.
   > > ```
   > 
   > Thanks @ArafatKhan2198 for handling some points. However I am not sure if 
parallelStreaming always improves performance, in fact rather sometimes, it 
increases more overhead and may do bad than good. I would like you to have a 
look 
[here](https://blogs.oracle.com/javamagazine/post/java-parallel-streams-performance-benchmark).
   
   Thanks a lot, devmadhuu, for the comment and the article! I've read through 
it carefully and here's my analysis:
   
   **Parallel Streaming concern:**
   
   - Parallel streams introduce overhead for managing multiple threads.
   - This overhead can outweigh the benefits of parallel processing for small 
datasets or simple operations.
   
   After going through the article I can summarise the following ➖
   
   - **Factors affecting performance:**
       - **Data size:** Parallel streams benefit from large datasets where the 
overhead is justified.
           - This sorting algorithm will be applied to response objects at a 
single level in the file system hierarchy, which could potentially encompass 
several hundred thousand items in the worst-case scenario under ideal 
conditions.
       - **Computation intensity:** Operations involving complex calculations 
benefit more from parallelization.
           - Sorting is considered a **moderately complex** calculation in the 
context of parallelization.
       - **Stream source:** Easily splittable sources like arrays perform 
better in parallel streams,
           - We are using Lists as our source.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to