adarshsanjeev commented on code in PR #18235:
URL: https://github.com/apache/druid/pull/18235#discussion_r2210146766


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/DataServerQueryHandlerUtils.java:
##########
@@ -48,17 +56,65 @@ private DataServerQueryHandlerUtils()
    * Performs necessary transforms to a query destined for data servers. Does 
not update the list of segments; callers
    * should do this themselves using {@link 
Queries#withSpecificSegments(Query, List)}.
    *
-   * @param query      the query
-   * @param dataSource datasource name
+   * @param query          the query
+   * @param dataSourceName datasource name
    */
-  public static <R, T extends Query<R>> Query<R> prepareQuery(final T query, 
final String dataSource)
+  public static <R, T extends Query<R>> Query<R> prepareQuery(
+      final T query,
+      final int inputNumber,
+      final String dataSourceName
+  )
   {
     // MSQ changes the datasource to an inputNumber datasource. This needs to 
be changed back for data servers
     // to understand.
+    return query.withDataSource(transformDatasource(query.getDataSource(), 
inputNumber, dataSourceName));
+  }
 
-    // BUG: This transformation is incorrect; see 
https://github.com/apache/druid/issues/18198. It loses decorations
-    // such as join, unnest, etc.
-    return query.withDataSource(new TableDataSource(dataSource));
+  /**
+   * Transforms {@link InputNumberDataSource} and {@link 
RestrictedInputNumberDataSource}, which are only understood
+   * by MSQ tasks, back into {@link TableDataSource} and {@link 
RestrictedDataSource} recursivly.
+   */
+  static DataSource transformDatasource(
+      final DataSource dataSource,
+      final int inputNumber,
+      final String dataSourceName
+  )

Review Comment:
   Yes, this is not the best way to handle it, however, this check would need 
to be present as a sanity check, and stops the query from returning any 
incorrect result.
   
   >identify the shape which should be rejected
   
   This is a bit more difficult to do accurately. To be perfectly accurate the 
error only needs to be thrown if the datasource being queried actually has 
realtime segments, and this information is not present at compilation time. The 
alternate is to fail queries which are querying any datasources if they have 
any broadcast joins or unions on them, but this has a chance to fail queries 
that would otherwise pass.
   
   Since this is a Druid 34 blocker, I wanted to prevent any incorrect results 
without causing any regressions by failing it more eagerly. Is there a way to 
fail it at compile time accurately?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to