jihoonson commented on a change in pull request #6629: Add support parallel
combine in brokers
URL: https://github.com/apache/incubator-druid/pull/6629#discussion_r239669860
##########
File path: processing/src/main/java/org/apache/druid/query/QueryContexts.java
##########
@@ -174,6 +179,35 @@
}
}
+ private static int checkPositive(String propertyName, int val)
+ {
+ Preconditions.checkArgument(
+ val > 0,
+ "%s should be positive, but [%s]",
+ propertyName,
+ val
+ );
+ return val;
+ }
+
+ /**
+ * Return the configured number of combine threads if any. Others {@link
#NO_PARALLEL_COMBINE_THREADS}.
+ */
+ public static <T> int getNumBrokerParallelCombineThreads(Query<T> query)
+ {
+ return parseInt(query, NUM_BROKER_PARALLEL_COMBINE_THREADS,
NO_PARALLEL_COMBINE_THREADS);
Review comment:
> Is it possible to manage processing resources so that there is confidence
that even if parallel combine is enabled for all queries, it doesn't make
things much worse (but parallelization overhead is acceptable, e. g. within 10%
of regression in the most unfavourable cases)?
Here is how it currently works.
- Brokers use the blockingPool to manage processing threads.
- If `NUM_BROKER_PARALLEL_COMBINE_THREADS` = -1 for a query, the broker
reserves all current available resources to the query. Once the query computes
the actual required number of thread pools among the available ones, it
immediately releases them
(https://github.com/apache/incubator-druid/pull/6629/files#diff-dc9ea12c892e41bbd957c77b43810ddeR103).
If there are not enough threads available, the query is simply run without
parallel combine.
- if `NUM_BROKER_PARALLEL_COMBINE_THREADS` is a positive number _n_, the
broker reserves _n_ threads. If there's not enough threads, the query waits for
_n_ threads to be available.
So, for your question, if `NUM_BROKER_PARALLEL_COMBINE_THREADS` is set to -1
for all queries, all queries can be run at least without waiting for threads
which I think is the bottom line.
> Is it possible to estimate the size of intermediate aggregates and enable
parallel combine automatically when it begins to make sense?
Well, we can do some rough optimization based on the estimated input size
and the cardinality of grouping keys, but not sure how good it will be.
In the future, I think it's possible to do advanced optimizations for
queries which need blocking aggregation in historicals, like groupBy. Since, in
blocking aggregation, historicals are able to know the exact size and number of
intermediate aggregates they will send to the broker, they can send those
metadata before sending the actual aggregates, so that the broker can make some
decisions about run-time optimization like parallel merge. Does this make sense?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]