capistrant edited a comment on pull request #10287: URL: https://github.com/apache/druid/pull/10287#issuecomment-733080773
@himanshug so I added a timer in my smaller environment and see that it is very fast to run LogUsedSegments as it is written today for cluster that has ~300 datasources and ~150k used segments Timer readings with existing code (millis): 20, 20, 18, 17, 18, 17, 18, 25 Timer readings with updated nested for loop (millis): 17, 5, 12, 3, 2, 2, 2, 2 My prod cluster is quite a bit larger. over 1k datasources and over 1MM segments. But I am not going to be able to add the timer there or test the nested for loop any time soon because we are in a change freeze until the new year. there does seem to be evidence of a good speedup in the smaller scale test I did. not sure if you think it is worth opening separate issue/PR to address the usages of the existing stream approach. But the question is, how many clusters operate at a scale where the increased performance is worth getting read of that nifty utility method ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
