jihoonson commented on a change in pull request #6629: Add support parallel
combine in brokers
URL: https://github.com/apache/incubator-druid/pull/6629#discussion_r239932614
##########
File path: processing/src/main/java/org/apache/druid/query/QueryContexts.java
##########
@@ -174,6 +179,35 @@
}
}
+ private static int checkPositive(String propertyName, int val)
+ {
+ Preconditions.checkArgument(
+ val > 0,
+ "%s should be positive, but [%s]",
+ propertyName,
+ val
+ );
+ return val;
+ }
+
+ /**
+ * Return the configured number of combine threads if any. Others {@link
#NO_PARALLEL_COMBINE_THREADS}.
+ */
+ public static <T> int getNumBrokerParallelCombineThreads(Query<T> query)
+ {
+ return parseInt(query, NUM_BROKER_PARALLEL_COMBINE_THREADS,
NO_PARALLEL_COMBINE_THREADS);
Review comment:
I think there're at least two algorithms applicable with ForkJoinPool.
1. Same algorithm with the one implemented in this PR. Combining sequences
is distributed to multiple ForkJoinTasks, and these tasks are pipelined. So,
each ForkJoinTask combines input sequences and stores outputs in its blocking
queue.
2. Splitting each sequence into small chunks and combines corresponding
chunks from different sequences in parallel. Each ForkJoinTask combines a set
of chunks of sequences. I think this is more natural way to use ForkJoinPool
because we need to fork tasks to combines small chunks and then join to compute
the final aggregates. However, this requires to materialize intermediate
aggregates which should be avoided.
So, I'm not sure what the benefit of using ForkJoinPool is here. At least we
don't have to add a new configuration for ForkJoinPool by using the existing
processing threads.
> I think the advantage that we have and parallel streams don't is that we
can safely assume that all pieces of data to be merged take approximately the
same time.
What kind of algorithms are you thinking? Would you elaborate more on the
assumption?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]