clintropolis commented on issue #11133:
URL: https://github.com/apache/druid/issues/11133#issuecomment-822881809


   Thanks for the report, I wondered if there might be a threshold where the 
expense of having to potentially compute the merge more times outweighs the 
advantage of being able to process in parallel. I unfortunately didn't really 
explore this very deeply in my investigations, but I think I did hit on 
something somewhat similar, where depending on the number of input sequences 
and number of cores, more parallel is not always better, 
https://github.com/apache/druid/pull/8578#issuecomment-548142920.
   
   It may be that the complexity of the merge and compare functions might need 
to be factored into [this computation which chooses how parallel the merge 
should 
be](https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java#L455).
 I ... don't know exactly how we would do that off the top of my head, or even 
if it is the correct solution, or if we would be better focused on making the 
sketch merges more efficient.
   
   I think it should be possible to replicate what you are seeing here, with a 
modified version the benchmarks that were created, 
https://github.com/apache/druid/blob/4caa221d72473f8307489d814f58bcfabde4c57e/benchmarks/src/test/java/org/apache/druid/benchmark/sequences/BaseParallelMergeCombiningSequenceBenchmark.java,
 testing with a more complicated merge function 
https://github.com/apache/druid/blob/4caa221d72473f8307489d814f58bcfabde4c57e/core/src/test/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequenceTest.java#L52
   that does what is going on in the theta sketch aggregators.
   
   I'll try to have a look into this sometime soon to see if I can reproduce it 
with benchmarks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to