[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #11809: Make subquery IDs more comprehensive

GitBox Tue, 02 Nov 2021 21:30:31 -0700


abhishekagarwal87 commented on a change in pull request #11809:
URL: https://github.com/apache/druid/pull/11809#discussion_r741613829




##########
File path: 
server/src/main/java/org/apache/druid/server/ClientQuerySegmentWalker.java
##########
@@ -431,6 +448,101 @@ private DataSource inlineIfNecessary(
         );
   }
 
+  /**
+   * This method returns the datasource by populating all the {@link 
QueryDataSource} with correct nesting level and
+   * sibling order of all the subqueries that are present.
+   * It also plumbs parent query's id and sql id in case the subqueries don't 
have it set by default
+   *
+   * @param dataSource       Datasource whose subqueries need to be populated
+   * @param parentQueryId    Parent Query's ID, can be null if do not need to 
update this in the subqueries
+   * @param parentSqlQueryId Parent Query's SQL Query ID, can be null if do 
not need to update this in the subqueries
+   * @return DataSource populated with the subqueries
+   */
+  private DataSource generateSubqueryIds(
+      DataSource dataSource,
+      @Nullable final String parentQueryId,
+      @Nullable final String parentSqlQueryId
+  )
+  {
+    Queue<DataSource> queue = new LinkedList<>();
+    queue.add(dataSource);
+
+    /*
+    Performs BFS on the datasource tree to find the nesting level, and the 
sibling order of the query datasource
+     */
+    Map<DataSource, Pair<Integer, Integer>> queryDataSourceToSubqueryIds = new 
HashMap<>();
+    int level = 1;
+    while (!queue.isEmpty()) {
+      int size = queue.size();
+      int siblingOrder = 1;
+      for (int i = 0; i < size; ++i) {
+        DataSource currentDataSource = queue.poll();
+        if (currentDataSource instanceof QueryDataSource) {
+          queryDataSourceToSubqueryIds.put(currentDataSource, new 
Pair<>(level, siblingOrder));

Review comment:
       It is not clear to me why we are not calling `insertSubQueryId` here 
itself. That is we have the level information already so we can populate the 
ids for the query corresponding to this QueryDataSource here itself instead of 
saving it inside a map and doing it later. am I missing something? 

##########
File path: processing/src/main/java/org/apache/druid/query/UnionQueryRunner.java
##########
@@ -71,19 +74,23 @@ public UnionQueryRunner(
         return new MergeSequence<>(
             query.getResultOrdering(),
             Sequences.simple(
-                Lists.transform(
-                    unionDataSource.getDataSources(),
-                    (Function<DataSource, Sequence<T>>) singleSource ->
-                        baseRunner.run(
-                            queryPlus.withQuery(
-                                Queries.withBaseDataSource(query, singleSource)
-                                       // assign the subqueryId. this will be 
used to validate that every query servers
-                                       // have responded per subquery in 
RetryQueryRunner
-                                       .withDefaultSubQueryId()
-                            ),
-                            responseContext
-                        )
-                )
+                IntStream.range(0, unionDataSource.getDataSources().size())
+                         .mapToObj(i -> new Pair<>(i + 1, 
unionDataSource.getDataSources().get(i)))
+                         .map(indexBaseDataSourcePair ->
+                                  baseRunner.run(
+                                      
queryPlus.withQuery(Queries.withBaseDataSource(
+                                          query,
+                                          indexBaseDataSourcePair.rhs
+                                      ).withSubQueryId(
+                                          generateSubqueryId(
+                                              query.getSubQueryId(),
+                                              // toString() works since the 
datasource will be a TableDataSource
+                                              
indexBaseDataSourcePair.rhs.toString(),

Review comment:
       maybe you can call `getName()` instead of `toString()`? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #11809: Make subquery IDs more comprehensive

Reply via email to