capistrant commented on pull request #10287:
URL: https://github.com/apache/druid/pull/10287#issuecomment-733910809


   > @capistrant thanks for the test. this is still surprising .. I did a quick 
benchmark (see #10604 ) and the iteration looks very fast (relative to ~10 sec) 
with streams and for-loops both even for 1000 dataSources and 2000 segments 
each i.e. 2mn segments overall .
   > where did you get the ~10 sec number from originally ?
   
   Our estimates were from wall clock time looking at logs. But I admit it is 
pretty hand wavy and glosses over some facts.
   
   EmitClusterStatsAndMetrics logs out some stuff at the end of its run. We 
then have our configured 30 second backoff time. Then we execute the historical 
management duties runnable again and the first duty is LogUsedSegment and it 
logs when it finishes.
   
   so if we have these two wall clock values
   
   2020-11-25T18:05:42,18
   2020-11-25T18:06:33,42
   
   you can say there was 11 seconds between the end of the backoff time and the 
completion of the first duty. But this neglects all of the stuff in 
DutiesRunnable#run() before we start running duties as well as any discrepancy 
in the amount of time that is actually backed off for between the end of one 
run and the next.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to