jihoonson commented on a change in pull request #6629: Add support parallel
combine in brokers
URL: https://github.com/apache/incubator-druid/pull/6629#discussion_r242275704
##########
File path: processing/src/main/java/org/apache/druid/query/QueryContexts.java
##########
@@ -174,6 +179,35 @@
}
}
+ private static int checkPositive(String propertyName, int val)
+ {
+ Preconditions.checkArgument(
+ val > 0,
+ "%s should be positive, but [%s]",
+ propertyName,
+ val
+ );
+ return val;
+ }
+
+ /**
+ * Return the configured number of combine threads if any. Others {@link
#NO_PARALLEL_COMBINE_THREADS}.
+ */
+ public static <T> int getNumBrokerParallelCombineThreads(Query<T> query)
+ {
+ return parseInt(query, NUM_BROKER_PARALLEL_COMBINE_THREADS,
NO_PARALLEL_COMBINE_THREADS);
Review comment:
@leventov sorry, maybe I wasn't clear. What I don't understand is how your
idea makes the combine faster than at least the current implementation in
master.
I talked about this with @jon-wei and maybe the following is what you're
thinking.
You may think of the below architecture for parallel combine. `H{n}` and
`F{n}` represent a HTTP thread and a Fork Join Task, respectively. `S{n}` is a
sequence.
```
H1 H2 H3 H4 H5 (HTTP thread pool)
\ \ | / /
+----------+
| F1 F2 F3 | (Fork Join Pool)
+----------+
/ / / \ \ \
S1 S2 S3 S4 S5 S6
```
The fork join tasks can combine any sequences for any queries, but the
sequences to combine together should be for the same query. Also, each fork
join task computes for a limited time like 10ms as you said. Once they complete
their task, they return the result to the proper HTTP thread. The HTTP threads
combine the results from fork join tasks again to get the final result. For a
query, since the combine is hierarchically computed in an HTTP thread and fork
join tasks, it can be done in parallel and be faster than the current
implementation in master. If this is what you think, I think it's a good idea.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]