Hi Jisoo, sorry, the previous email was sent by accident.
The initial version of groupBy v2 wasn't capable of combining intermediates in parallel. Some of our customers met the similar issue to yours, and so I was working on improving groupBy v2 performance for a while. Parallel combining on brokers definitely makes sense. I was thinking to add a sort of ParallelMergeSequence which is a parallel version of MergeSequence, but it can be anything if it supports parallel combining on brokers. One thing I'm worrying about is, most query processing interfaces in brokers are using Sequence, and thus using another stuff for a specific query type might make the codes complicated. I think we need to avoid it if possible. Best, Jihoon On Thu, Jul 19, 2018 at 2:58 PM Jihoon Son <[email protected]> wrote: > Hi Jisoo, > > the initial version of groupBy v2 > > On Thu, Jul 19, 2018 at 2:42 PM Jisoo Kim <[email protected]> > wrote: > >> Hi all, >> >> I am currently working on a project that uses Druid's QueryRunner and >> other >> druid-processing classes. It uses Druid's own classes to calculate query >> results. I have been testing large GroupBy queries (using v2), and it >> seems >> like parallel combining threads for GroupBy queries are only enabled on >> the >> historical level. I think it is only getting called by >> GroupByStrategyV2.mergeRunners() >> < >> https://github.com/apache/incubator-druid/blob/druid-0.12.1/processing/src/main/java/io/druid/query/groupby/strategy/GroupByStrategyV2.java#L335 >> > >> which is only called by GroupByQueryRunnerFactory.mergeRunners() on >> historicals. >> >> Are GroupByMergingQueryRunnerV2 and parallel combining threads meant for >> computing and merging per-segment results only, or can they also be used >> on >> the broker level? I changed the logic of my project from calling >> queryToolChest.mergeResults() on MergeSequence (created by providing a >> list >> of per-segment/per-server sequences) to calling >> queryToolChest.mergeResults() on queryRunnerFactory.mergeRunners() (where >> each runner returns a deserialized result sequence), and that seemed to >> have reduced really heavy groupby query computation time or failures by >> quite a lot. Or is this just a coincidence and there shouldn't be a >> performance difference in merging groupby query results, and the only >> difference could've been by parallelizing the deserialization of result >> sequences from sub-queries? >> >> Thanks, >> Jisoo >> >
