capistrant commented on pull request #10287: URL: https://github.com/apache/druid/pull/10287#issuecomment-733910809
> @capistrant thanks for the test. this is still surprising .. I did a quick benchmark (see #10604 ) and the iteration looks very fast (relative to ~10 sec) with streams and for-loops both even for 1000 dataSources and 2000 segments each i.e. 2mn segments overall . > where did you get the ~10 sec number from originally ? Our estimates were from wall clock time looking at logs. But I admit it is pretty hand wavy and glosses over some facts. EmitClusterStatsAndMetrics logs out some stuff at the end of its run. We then have our configured 30 second backoff time. Then we execute the historical management duties runnable again and the first duty is LogUsedSegment and it logs when it finishes. so if we have these two wall clock values 2020-11-25T18:05:42,18 2020-11-25T18:06:33,42 you can say there was 11 seconds between the end of the backoff time and the completion of the first duty. But this neglects all of the stuff in DutiesRunnable#run() before we start running duties as well as any discrepancy in the amount of time that is actually backed off for between the end of one run and the next. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
