jihoonson commented on a change in pull request #6629: Add support parallel 
combine in brokers
URL: https://github.com/apache/incubator-druid/pull/6629#discussion_r239669860
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/QueryContexts.java
 ##########
 @@ -174,6 +179,35 @@
     }
   }
 
+  private static int checkPositive(String propertyName, int val)
+  {
+    Preconditions.checkArgument(
+        val > 0,
+        "%s should be positive, but [%s]",
+        propertyName,
+        val
+    );
+    return val;
+  }
+
+  /**
+   * Return the configured number of combine threads if any. Others {@link 
#NO_PARALLEL_COMBINE_THREADS}.
+   */
+  public static <T> int getNumBrokerParallelCombineThreads(Query<T> query)
+  {
+    return parseInt(query, NUM_BROKER_PARALLEL_COMBINE_THREADS, 
NO_PARALLEL_COMBINE_THREADS);
 
 Review comment:
   > Is it possible to manage processing resources so that there is confidence 
that even if parallel combine is enabled for all queries, it doesn't make 
things much worse (but parallelization overhead is acceptable, e. g. within 10% 
of regression in the most unfavourable cases)?
   
   Here is how it currently works.
   
   - Brokers use the blockingPool to manage processing threads.
   - If `NUM_BROKER_PARALLEL_COMBINE_THREADS` = -1 for a query, the broker 
reserves all current available resources to the query. Once the query computes 
the actual required number of thread pools among the available ones, it 
immediately releases them 
(https://github.com/apache/incubator-druid/pull/6629/files#diff-dc9ea12c892e41bbd957c77b43810ddeR103).
 If there are not enough threads available, the query is simply run without 
parallel combine.
   - if `NUM_BROKER_PARALLEL_COMBINE_THREADS` is a positive number _n_, the 
broker reserves _n_ threads. If there's not enough threads, the query waits for 
_n_ threads to be available.
   
   So, for your question, if `NUM_BROKER_PARALLEL_COMBINE_THREADS` is set to -1 
for all queries, all queries can be run at least without waiting for threads 
which I think is the bottom line.
   
   > Is it possible to estimate the size of intermediate aggregates and enable 
parallel combine automatically when it begins to make sense?
   
   Well, we can do some rough optimization based on the estimated input size 
and the cardinality of grouping keys, but not sure how good it will be.
   
   In the future, I think it's possible to do advanced optimizations for 
queries which need blocking aggregation in historicals, like groupBy. Since, in 
blocking aggregation, historicals are able to know the exact size and number of 
intermediate aggregates they will send to the broker, they can send those 
metadata before sending the actual aggregates, so that the broker can make some 
decisions about run-time optimization like parallel merge. Does this make sense?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to