Jackie-Jiang commented on PR #11943:
URL: https://github.com/apache/pinot/pull/11943#issuecomment-1795755694

   @mcvsubbu Thanks for taking time writing this program!
   
   > According to this, it takes slightly more number of iterations to 
stabilize to the right segment size if we apply the algorithm for all 
partitions. I tried with numPartitions = 1 and numPartitions=32.
   
   Conceptually, with `numPartitions=32`, it should take much less iterations 
comparing to `numPartitions=1` to stabilize to the desired segment size. The 
reason why it didn't show in the experiment is because we assume **all 
partitions report the same number of rows** within an iteration. Within the 
same iteration, the segment committed earlier can contribute to a more accurate 
segment size ration to be picked up by the next committing segment, which is 
not captured in the program.
   The program also assumes all segment commit at roughly the same time, which 
is not always the case (actually we probably want to avoid this because it can 
cause hotspot).
   
   Based on the above analysis, do you think we can simply this handling to 
just count all committing segments?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to