@drcrallen basically I want to check how large performance benefit the new parallel merge algorithm can give us. Even though you mentioned that it shows about 85% of performance improvement in some internal tests, it shows the overall performance benefit not the algorithm itself. Of course, the overall performance is more important, but the later one is also important because it gives us an insight about what we can expect exactly with this feature, I think.
Also the query performance is affected by many factors like dataSource size, query filter selectivity, # of aggregators and their types, cluster size, and so on. JMH would be useful because we can easily replicate the performance benchmark with the same query and the same data which makes us easy to maintain or improve it in the future. I think the JMH should include the below: - Simplifying historical part to check the performance of only the broker merge algorithm. For example, there's no HTTP communication between CachingClusteredClient and actual query runners of historicals. Also, the query runners of historicals can just do aggregation for a single segment (or even return just some pre-aggregated values). - Benchmarking with varying `intermediateMergeBatchThreshold` and # of streams to be merged. - Maybe need testing against different query types (timeseries, topN, groupBy). [ Full content available at: https://github.com/apache/incubator-druid/pull/5913 ] This message was relayed via gitbox.apache.org for devnull@infra.apache.org