clintropolis commented on issue #11133: URL: https://github.com/apache/druid/issues/11133#issuecomment-822881809
Thanks for the report, I wondered if there might be a threshold where the expense of having to potentially compute the merge more times outweighs the advantage of being able to process in parallel. I unfortunately didn't really explore this very deeply in my investigations, but I think I did hit on something somewhat similar, where depending on the number of input sequences and number of cores, more parallel is not always better, https://github.com/apache/druid/pull/8578#issuecomment-548142920. It may be that the complexity of the merge and compare functions might need to be factored into [this computation which chooses how parallel the merge should be](https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java#L455). I ... don't know exactly how we would do that off the top of my head, or even if it is the correct solution, or if we would be better focused on making the sketch merges more efficient. I think it should be possible to replicate what you are seeing here, with a modified version the benchmarks that were created, https://github.com/apache/druid/blob/4caa221d72473f8307489d814f58bcfabde4c57e/benchmarks/src/test/java/org/apache/druid/benchmark/sequences/BaseParallelMergeCombiningSequenceBenchmark.java, testing with a more complicated merge function https://github.com/apache/druid/blob/4caa221d72473f8307489d814f58bcfabde4c57e/core/src/test/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequenceTest.java#L52 that does what is going on in the theta sketch aggregators. I'll try to have a look into this sometime soon to see if I can reproduce it with benchmarks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org