capistrant edited a comment on pull request #10287:
URL: https://github.com/apache/druid/pull/10287#issuecomment-733080773


   @himanshug so I added a timer in my smaller environment and see that it is 
very fast to run LogUsedSegments as it is written today for cluster that has 
~300 datasources and ~150k used segments
   Timer readings with existing code (millis): 20, 20, 18, 17, 18, 17, 18, 25
   Timer readings with updated nested for loop (millis): 17, 5, 12, 3, 2, 2, 2, 
2 
   
   My prod cluster is quite a bit larger. over 1k datasources and over 1MM 
segments. But I am not going to be able to add the timer there or test the 
nested for loop any time soon because we are in a change freeze until the new 
year.
   
   there does seem to be evidence of a good speedup in the smaller scale test I 
did. not sure if you think it is worth opening separate issue/PR to address the 
usages of the existing stream approach. But the question is, how many clusters 
operate at a scale where the increased performance is worth getting read of 
that nifty utility method


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to