[GitHub] [incubator-druid] jihoonson commented on pull request #5913: Move Caching Cluster Client to java streams and allow parallel intermediate merges

GitHub Thu, 30 Aug 2018 17:13:02 -0700

Also, I've just noticed that this also breaks the assumption of groupBy v2. In 
groupBy v2, the broker assumes that the intermediate aggregates are always 
sorted by the grouping keys, so that it can perform the merge-sorted 
aggregation. However, calling `QueryRunnerFactory.mergeRunners()` internally 
performs hash-aggregation (or array-based aggregation) and then sort again 
which is inefficient. For groupBy v2, merge-sorted aggregation should be 
performed in parallel. Maybe we need to add a new method to QueryToolChest 
which is different from the merge in historicals and the final merge in brokers.


We've recently had a discussion about this on dev mailing. See 
https://lists.apache.org/thread.html/b4c1cbe0c97e52ae5a137f4315af6a202a24d3034f53ce92c0d30150@%3Cdev.druid.apache.org%3E
 for more details.

[ Full content available at: 
https://github.com/apache/incubator-druid/pull/5913 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [incubator-druid] jihoonson commented on pull request #5913: Move Caching Cluster Client to java streams and allow parallel intermediate merges

Reply via email to