@drcrallen basically I want to check how large performance benefit the new 
parallel merge algorithm can give us. Even though you mentioned that it shows 
about 85% of performance improvement in some internal tests, it shows the 
overall performance benefit not the algorithm itself. Of course, the overall 
performance is more important, but the later one is also important because it 
gives us an insight about what we can expect exactly with this feature, I think.

Also the query performance is affected by many factors like dataSource size, 
query filter selectivity, # of aggregators and their types, cluster size, and 
so on. JMH would be useful because we can easily replicate the performance 
benchmark with the same query and the same data which makes us easy to maintain 
or improve it in the future.

I think the JMH should include the below:

- Simplifying historical part to check the performance of only the broker merge 
algorithm. For example, there's no HTTP communication between 
CachingClusteredClient and actual query runners of historicals. Also, the query 
runners of historicals can just do aggregation for a single segment (or even 
return just some pre-aggregated values).
- Benchmarking with varying `intermediateMergeBatchThreshold` and # of streams 
to be merged.
- Maybe need testing against different query types (timeseries, topN, groupBy).

[ Full content available at: 
https://github.com/apache/incubator-druid/pull/5913 ]
This message was relayed via gitbox.apache.org for devnull@infra.apache.org

Reply via email to