Re: [PR] Set explain attributes after the query is prepared (druid)

via GitHub Wed, 05 Jul 2023 16:50:58 -0700


abhishekrb19 commented on code in PR #14490:
URL: https://github.com/apache/druid/pull/14490#discussion_r1253763320



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java:
##########
@@ -327,84 +325,54 @@ public static SqlOrderBy 
convertClusterByToOrderBy(SqlNode query, SqlNodeList cl
    * </pre>
    *
    * <p>
-   * The function will return the following clusteredBy columns for the above 
SQL: ["__time", "page_alias", "FLOOR(\"cost\")", cityName"]
+   * The function will return the following clusteredBy columns for the above 
SQL: ["__time", "page_alias", "FLOOR(\"cost\")", cityName"].
    * Any ordinal and expression specified in the CLUSTERED BY clause will 
resolve to the final output column name.
    * </p>
+   * <p>
+   * This function must be called after the query is prepared when all the 
validations are complete, including {@link #validateClusteredByColumns},
+   * so we can safely access the arguments.
+   * </p>
    * @param clusteredByNodes  List of {@link SqlNode}s representing columns to 
be clustered by.
-   * @param sourceNode The select or order by source node.
+   * @param sourceFieldMappings The source field output mappings extracted 
from the validated root query rel node post prepare phase.
+   *
    */
   @Nullable
-  public static List<String> 
resolveClusteredByColumnsToOutputColumns(SqlNodeList clusteredByNodes, SqlNode 
sourceNode)
+  public static List<String> resolveClusteredByColumnsToOutputColumns(
+      final SqlNodeList clusteredByNodes,
+      final ImmutableList<Pair<Integer, String>> sourceFieldMappings
+  )
   {
     // CLUSTERED BY is an optional clause
     if (clusteredByNodes == null) {
       return null;
     }
 
-    Preconditions.checkArgument(
-        sourceNode instanceof SqlSelect || sourceNode instanceof SqlOrderBy,
-        "Source node must be either SqlSelect or SqlOrderBy, but found [%s]",
-        sourceNode == null ? null : sourceNode.getKind()
-    );
-
-    final SqlSelect selectNode = (sourceNode instanceof SqlSelect) ? 
(SqlSelect) sourceNode
-                                                                   : 
(SqlSelect) ((SqlOrderBy) sourceNode).query;
-    final List<SqlNode> selectList = selectNode.getSelectList().getList();
     final List<String> retClusteredByNames = new ArrayList<>();
 
     for (SqlNode clusteredByNode : clusteredByNodes) {
 
-      if (SqlUtil.isLiteral(clusteredByNode)) {
-        // The node is a literal number -- an ordinal is specified in the 
CLUSTERED BY clause. Validate and lookup the
-        // ordinal in the select list.
-        int ordinal = ((SqlNumericLiteral) 
clusteredByNode).getValueAs(Integer.class);
-        if (ordinal < 1 || ordinal > selectList.size()) {
-          throw InvalidSqlInput.exception(
-              "Ordinal[%d] specified in the CLUSTERED BY clause is invalid. It 
must be between 1 and %d.",
-              ordinal,
-              selectList.size()
-          );
-        }
-        SqlNode node = selectList.get(ordinal - 1);
-
-        if (node instanceof SqlBasicCall) {
-          retClusteredByNames.add(getColumnNameFromSqlCall(node));
-        } else {
-          Preconditions.checkArgument(
-              node instanceof SqlIdentifier,
-              "Node must be a SqlIdentifier, but found [%s]",
-              node.getKind()
-          );
-          SqlIdentifier n = ((SqlIdentifier) node);
-          retClusteredByNames.add(n.isSimple() ? n.getSimple() : 
n.names.get(1));
-        }
+      if (clusteredByNode instanceof SqlNumericLiteral) {
+        // The node is a literal number -- an ordinal in the CLUSTERED BY 
clause. Lookup the ordinal in field mappings.
+        final int ordinal = ((SqlNumericLiteral) 
clusteredByNode).getValueAs(Integer.class);
+        retClusteredByNames.add(sourceFieldMappings.get(ordinal - 1).right);

Review Comment:
   yes, Calcite throws an ordinal out of range exception - I also added a 
negative ordinal check, which later Calcite versions seems to fix. As described 
in the PR summary, with the attribute setting done _after_ the query is 
prepared, we leverage all the validations that Calcite does in addition to 
semantic checking we do in the SQL planner. Please see the unit tests 
`testReplaceWithNonExistentOrdinalInClusteredBy` and 
`testReplaceWithNegativeOrdinalInClusteredBy` that demonstrate that invalid 
ordinals are handled gracefully



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Set explain attributes after the query is prepared (druid)

Reply via email to