Re: [PR] INSERT/REPLACE can omit clustering when catalog has default (druid)

via GitHub Wed, 24 Apr 2024 01:16:00 -0700


kgyrtkirk commented on code in PR #16260:
URL: https://github.com/apache/druid/pull/16260#discussion_r1577475455



##########
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java:
##########
@@ -348,6 +398,42 @@ private DatasourceTable validateInsertTarget(
     }
   }
 
+  /**
+   * Clustering is a kind of ordering. We push the CLUSTERED BY clause into 
the source query as
+   * an ORDER BY clause. In an ideal world, clustering would be outside of 
SELECT: it is an operation
+   * applied after the data is selected. For now, we push it into the SELECT, 
then MSQ pulls it back
+   * out. This is unfortunate as errors will be attributed to ORDER BY, which 
the user did not
+   * actually specify (it is an error to do so.) However, with the current 
hybrid structure, it is
+   * not possible to add the ORDER by later: doing so requires access to the 
order by namespace
+   * which is not visible to subclasses.
+   */
+  private void rewriteClusteringToOrderBy(SqlNode source, DruidSqlIngest 
ingestNode, @Nullable SqlNodeList catalogClustering)

Review Comment:
   much better - thank you for the updates!



##########
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java:
##########
@@ -281,6 +286,37 @@ public void validateInsert(final SqlInsert insert)
     }
   }
 
+  @Override
+  protected SelectNamespace createSelectNamespace(
+      SqlSelect select,
+      SqlNode enclosingNode)
+  {
+    if (enclosingNode instanceof DruidSqlIngest) {
+      // The target is a new or existing datasource.
+      // The target namespace is both the target table ID and the row type for 
that table.
+      final SqlValidatorNamespace targetNamespace = Objects.requireNonNull(
+          getNamespace(enclosingNode),
+          () -> "namespace for " + enclosingNode
+      );
+      final IdentifierNamespace insertNs = (IdentifierNamespace) 
targetNamespace;
+      SqlIdentifier identifier = insertNs.getId();
+      SqlValidatorTable catalogTable = 
getCatalogReader().getTable(identifier.names);
+      if (catalogTable != null) {

Review Comment:
   wouldn't the fall-thru from this condtional will cause that the `CLUSTER BY` 
on the `ingestNode` will not be applied (line399 right now); even if its there 
- is that okay?
   
   



##########
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java:
##########
@@ -397,6 +459,34 @@ private Granularity getEffectiveGranularity(
     return effectiveGranularity;
   }
 
+  @Nullable
+  private SqlNodeList convertCatalogClustering(final DatasourceFacade 
tableMetadata)
+  {
+    if (tableMetadata == null) {
+      return null;
+    }
+    final List<ClusterKeySpec> keyCols = tableMetadata.clusterKeys();
+    if (CollectionUtils.isNullOrEmpty(keyCols)) {
+      return null;
+    }
+    final SqlNodeList keyNodes = new SqlNodeList(SqlParserPos.ZERO);
+    for (ClusterKeySpec keyCol : keyCols) {
+      final SqlIdentifier colIdent = new SqlIdentifier(
+          Collections.singletonList(keyCol.expr()),
+          null, SqlParserPos.ZERO,
+          Collections.singletonList(SqlParserPos.ZERO)
+      );

Review Comment:
   I was wondering what will happen in the following case:
   * say colunmn `c` is a `clusterKey` 
   * we are selecting from a join which has column `c` on both sides
   
   but it seems like the column in the select list will take precedence.
   
   one more thing I was wondering about: do we have a check that all `keyCols` 
are present in the selected column list?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] INSERT/REPLACE can omit clustering when catalog has default (druid)

Reply via email to