kfaraz commented on PR #16691: URL: https://github.com/apache/druid/pull/16691#issuecomment-2220328893
Thanks for the feedback, @AmatyaAvadhanula ! > Would a simpler approach such as [sum(segment size) / time] across successful loads in a coordinator cycle not be sufficient? Segment loads are not tied to a coordinator cycle. "Coordinator cycle" or "coordinator run" simply refers to a single invocation of a duty like `RunRules` or `BalanceSegments`. After the duty has run and assigned a bunch of segments to the load queue, the segments may take any amount of time to finish loading. While summing up the sizes of successfully loaded segments is trivial, the definition of _time elapsed_ is what complicates the whole logic. Problems: 1. We want some kind of a moving average. 2. Segments assigned in one coordinator run may remain in the queue for several runs. So when is the start time and end time? 3. While there are already segments in queue, the next coordinator run may assign more segments. How would this affect start time and end time? --- The simplest (and most intuitive) thing to do would be to track the load time of each segment individually. I actually started out doing this. Start time would be the time when the request to load that segment is first sent to the server. End time would be when the request succeeds. This design alternative has been alluded to in the PR description as well. __But this would be incorrect,__ since while a segment is being loaded on the historical by one thread, another thread could be loading another segment. In other words, _the segment load durations are not mutually exclusive,_ so we can't simply sum them up. If we did, the computed loading rate would be lower than the actual (not the end of the world but still). That said, if there is only one loading thread on the server (which is often the case), then the naive logic works just fine. ``` numLoadingThreads = Math.max(1, JvmUtils.getRuntimeInfo().getAvailableProcessors() / 6) ``` --- Let me know what you think. If you feel this seems too complicated and we could get away with the naive logic for now, I can just do that and save this convoluted design for a rainy day 😂 . Once we have seen the feature in action, we will know for sure. In the future, if the Coordinator could know the number of loading threads on the server, we could just multiply the computed rate by num threads to offset the effect of summing up the times. cc: @abhishekrb19 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
