Re: Question on GroupBy query results merging process

Jihoon Son Thu, 19 Jul 2018 15:07:27 -0700

Hi Jisoo,

sorry, the previous email was sent by accident.

The initial version of groupBy v2 wasn't capable of combining intermediates
in parallel. Some of our customers met the similar issue to yours, and so I
was working on improving groupBy v2 performance for a while.

Parallel combining on brokers definitely makes sense. I was thinking to add
a sort of ParallelMergeSequence which is a parallel version of
MergeSequence, but it can be anything if it supports parallel combining on
brokers.

One thing I'm worrying about is, most query processing interfaces in
brokers are using Sequence, and thus using another stuff for a specific
query type might make the codes complicated. I think we need to avoid it if
possible.

Best,
Jihoon

On Thu, Jul 19, 2018 at 2:58 PM Jihoon Son <[email protected]> wrote:

> Hi Jisoo,
>
> the initial version of groupBy v2
>
> On Thu, Jul 19, 2018 at 2:42 PM Jisoo Kim <[email protected]>
> wrote:
>
>> Hi all,
>>
>> I am currently working on a project that uses Druid's QueryRunner and
>> other
>> druid-processing classes. It uses Druid's own classes to calculate query
>> results. I have been testing large GroupBy queries (using v2), and it
>> seems
>> like parallel combining threads for GroupBy queries are only enabled on
>> the
>> historical level. I think it is only getting called by
>> GroupByStrategyV2.mergeRunners()
>> <
>> https://github.com/apache/incubator-druid/blob/druid-0.12.1/processing/src/main/java/io/druid/query/groupby/strategy/GroupByStrategyV2.java#L335
>> >
>> which is only called by GroupByQueryRunnerFactory.mergeRunners() on
>> historicals.
>>
>> Are GroupByMergingQueryRunnerV2 and parallel combining threads meant for
>> computing and merging per-segment results only, or can they also be used
>> on
>> the broker level? I changed the logic of my project from calling
>> queryToolChest.mergeResults() on MergeSequence (created by providing a
>> list
>> of per-segment/per-server sequences) to calling
>> queryToolChest.mergeResults() on queryRunnerFactory.mergeRunners() (where
>> each runner returns a deserialized result sequence), and that seemed to
>> have reduced really heavy groupby query computation time or failures by
>> quite a lot. Or is this just a coincidence and there shouldn't be a
>> performance difference in merging groupby query results, and the only
>> difference could've been by parallelizing the deserialization of result
>> sequences from sub-queries?
>>
>> Thanks,
>> Jisoo
>>
>

Re: Question on GroupBy query results merging process

Reply via email to