[GitHub] jihoonson commented on a change in pull request #6629: Add support parallel combine in brokers

GitBox Fri, 07 Dec 2018 12:10:57 -0800

jihoonson commented on a change in pull request #6629: Add support parallel 
combine in brokers
URL: https://github.com/apache/incubator-druid/pull/6629#discussion_r239932614


 ##########
 File path: processing/src/main/java/org/apache/druid/query/QueryContexts.java
 ##########
 @@ -174,6 +179,35 @@
     }
   }
 
+  private static int checkPositive(String propertyName, int val)
+  {
+    Preconditions.checkArgument(
+        val > 0,
+        "%s should be positive, but [%s]",
+        propertyName,
+        val
+    );
+    return val;
+  }
+
+  /**
+   * Return the configured number of combine threads if any. Others {@link 
#NO_PARALLEL_COMBINE_THREADS}.
+   */
+  public static <T> int getNumBrokerParallelCombineThreads(Query<T> query)
+  {
+    return parseInt(query, NUM_BROKER_PARALLEL_COMBINE_THREADS, 
NO_PARALLEL_COMBINE_THREADS);
 
 Review comment:
   I think there're at least two algorithms applicable with ForkJoinPool.
   
   1. Same algorithm with the one implemented in this PR. Combining sequences 
is distributed to multiple ForkJoinTasks, and these tasks are pipelined. So, 
each ForkJoinTask combines input sequences and stores outputs in its blocking 
queue. 
   2. Splitting each sequence into small chunks and combines corresponding 
chunks from different sequences in parallel. Each ForkJoinTask combines a set 
of chunks of sequences. I think this is more natural way to use ForkJoinPool 
because we need to fork tasks to combines small chunks and then join to compute 
the final aggregates. However, this requires to materialize intermediate 
aggregates which should be avoided.
   
   So, I'm not sure what the benefit of using ForkJoinPool is here. At least we 
don't have to add a new configuration for ForkJoinPool by using the existing 
processing threads.
   
   > I think the advantage that we have and parallel streams don't is that we 
can safely assume that all pieces of data to be merged take approximately the 
same time. 
   
   What kind of algorithms are you thinking? Would you elaborate more on the 
assumption?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] jihoonson commented on a change in pull request #6629: Add support parallel combine in brokers

Reply via email to