clintropolis commented on issue #8578: parallel broker merges on fork join pool URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-549253205 ### more realistic worst case I reworked the JMH thread based benchmark to use thread groups to examine what happens in a more realistic scenario, with the newly renamed `ParallelMergeCombiningSequenceThreadedBenchmark`. I find this benchmark to be a fair bit less scary than the previous 'worst case' benchmarks, which focused on an impossible scenario because I really wanted to dig in and see where and how the wheels fell off. This benchmark models a more 'typical' heavy load, where the majority of the queries are smaller result-sets with shorter blocking times and a smaller subset are larger result sets with longer initial blocking times. By using thread groups we can look at performance for these 'classes' of queries as load increases. This set was collected with a ratio of 1 'moderately large' query for every 8 'small' queries, where 'moderately large' is defined as input sequence row counts of 50k-75k rows and blocking for 1-2.5 seconds before yielding results, and 'small' is defined as input sequence row counts of 500-10k and blocking for 50-200ms. Keep in mind while reviewing the result that I collected data on a significantly higher level of parallelism than I would expect a 16 core machine to be realistically configured to handle. I would probably configure an m5.8xl with ~64 http threads, but collected data points up to 128 concurrent sequences being processed. The first plot shows the merge time (y axis) growth as concurrency (x axis) increases, animated to show the differences for a given number of input sequences (analagous to cluster size). ![thread-groups-typical-distribution-1-8-small](https://user-images.githubusercontent.com/1577461/68105759-6125e880-fe94-11e9-86a4-cae8fb52b92b.gif) Note that the x axis is the _total_ concurrency count, not the number of threads of this particular group. Also worth pointing out is that the degradation of performance happens at a significantly higher level of concurrency than the previous (unrealistic) worse case performance, but in terms of characteristics, it does share some aspects with the previous plots, such as 8 input sequences being a lot more performant than say 64, and after a certain threshold, the performance of the parallel approach crosses the limit of the same threaded serial merge approach. The larger 'queries' tell a similar tale: ![thread-groups-typical-distribution-1-8-moderately-large](https://user-images.githubusercontent.com/1577461/68106055-4142f480-fe95-11e9-897b-57c7cf8b4ace.gif) The differences here when the parallel merge sequence crosses the threshold look to me a fair bit less dramatic than the 'small' sequences, but keep in mind the 'big jump' in the small sequences only amount to a few hundred milliseconds, so it's not quite as dramatic as it appears. The final plot shows the overall average between both groups: ![thread-groups-typical-distribution-1-8-average](https://user-images.githubusercontent.com/1577461/68105727-46ec0a80-fe94-11e9-9854-aaae9d8405c7.gif) which I find a bit less useful than the other 2 plots, but included anyway for completeness.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org