Jackie-Jiang commented on PR #11943: URL: https://github.com/apache/pinot/pull/11943#issuecomment-1795755694
@mcvsubbu Thanks for taking time writing this program! > According to this, it takes slightly more number of iterations to stabilize to the right segment size if we apply the algorithm for all partitions. I tried with numPartitions = 1 and numPartitions=32. Conceptually, with `numPartitions=32`, it should take much less iterations comparing to `numPartitions=1` to stabilize to the desired segment size. The reason why it didn't show in the experiment is because we assume **all partitions report the same number of rows** within an iteration. Within the same iteration, the segment committed earlier can contribute to a more accurate segment size ration to be picked up by the next committing segment, which is not captured in the program. The program also assumes all segment commit at roughly the same time, which is not always the case (actually we probably want to avoid this because it can cause hotspot). Based on the above analysis, do you think we can simply this handling to just count all committing segments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
