[GitHub] [druid] gianm commented on a change in pull request #12163: Add syntax support for PARTITION BY/CLUSTER BY in INSERT queries

GitBox Wed, 26 Jan 2022 18:52:11 -0800


gianm commented on a change in pull request #12163:
URL: https://github.com/apache/druid/pull/12163#discussion_r793121059




##########
File path: processing/src/main/java/org/apache/druid/query/QueryContexts.java
##########
@@ -68,6 +68,7 @@
   public static final String ENABLE_DEBUG = "debug";
   public static final String BY_SEGMENT_KEY = "bySegment";
   public static final String BROKER_SERVICE_NAME = "brokerService";
+  public static final String INGESTION_GRANULARITY = "ingestionGranularity";

Review comment:
       It's weird for this to be in QueryContexts, since it's not something 
that the native query engines handle. I think it'd make more sense in the sql 
module. Maybe put it in the DruidSqlInsert file?
   
   Also, imo, a better name would be `sqlInsertSegmentGranularity`. It's more 
explicit, and has "sql" in the name, which is conventional for context 
parameters that only apply to SQL.

##########
File path: pom.xml
##########
@@ -82,6 +82,8 @@
         <apache.ranger.gson.version>2.2.4</apache.ranger.gson.version>
         <avatica.version>1.17.0</avatica.version>
         <avro.version>1.9.2</avro.version>
+        <!-- sql/src/main/codegen/config.fmpp is version dependent. Read the 
top level comments mentioned there when
+          upgrading the Calcite's version (specifically calcite-core's 
version) -->

Review comment:
       This comment should be more explicit, so people really don't miss it. 
Something like:
   
   ```
   <!-- sql/src/main/codegen/config.fmpp is based on a file from calcite-core, 
and needs to be
        updated when upgrading Calcite. Refer to the top-level comments in that 
file for details. -->
   ```

##########
File path: sql/pom.xml
##########
@@ -180,6 +180,11 @@
       <artifactId>validation-api</artifactId>
       <scope>provided</scope>
     </dependency>
+    <dependency>
+      <groupId>org.slf4j</groupId>
+      <artifactId>slf4j-api</artifactId>
+      <version>1.7.25</version>

Review comment:
       We shouldn't use different versions of slf4j-api in the sql module vs. 
the rest of the modules: it should be `<scope>provided</scope>` with no version 
specified. That way, we'll only ship one copy of the jar.
   
   If you do that change, does it work out OK or does something bad happen?

##########
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.parser;
+
+import org.apache.calcite.sql.SqlInsert;
+import org.apache.calcite.sql.SqlLiteral;
+import org.apache.calcite.sql.SqlNode;
+import org.apache.calcite.sql.SqlNodeList;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.SqlWriter;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+/**
+ * Extends the Insert call to hold custom paramaters specific to druid i.e. 
PARTITION BY and CLUSTER BY
+ * This class extends the {@link SqlInsert} so that this SqlNode can be used in
+ * {@link org.apache.calcite.sql2rel.SqlToRelConverter} for getting converted 
into RelNode, and further processing
+ */
+public class DruidSqlInsert extends SqlInsert
+{
+  // Unsure if this should be kept as is, but this allows reusing super.unparse

Review comment:
       It's just `new SqlSpecialOperator("INSERT", SqlKind.INSERT)` so I think 
it's fine. Could you please update the comment to be less scary & uncertain 
sounding? Like:
   
   ```java
   // Allows reusing super.unparse.
   ```

##########
File path: sql/src/main/codegen/config.fmpp
##########
@@ -0,0 +1,433 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This file is an FMPP (http://fmpp.sourceforge.net/) configuration file to
+# allow clients to extend Calcite's SQL parser to support application specific
+# SQL statements, literals or data types.
+#
+# Calcite's parser grammar file (Parser.jj) is written in javacc
+# (http://javacc.java.net/) with Freemarker (http://freemarker.org/) variables
+# to allow clients to:
+#   1. have custom parser implementation class and package name.
+#   2. insert new parser method implementations written in javacc to parse
+#      custom:
+#      a) SQL statements.
+#      b) literals.
+#      c) data types.
+#   3. add new keywords to support custom SQL constructs added as part of (2).
+#   4. add import statements needed by inserted custom parser implementations.
+#
+# Parser template file (Parser.jj) along with this file are packaged as
+# part of the calcite-core-<version>.jar under "codegen" directory.
+
+data: {
+  parser: {
+    # Generated parser implementation package and class name.
+    package: "org.apache.druid.sql.calcite.parser",
+    class: "DruidSqlParserImpl",
+
+    # List of additional classes and packages to import.
+    # Example. "org.apache.calcite.sql.*", "java.util.List".
+    imports: [
+      "org.apache.calcite.sql.SqlNode"
+      "org.apache.calcite.sql.SqlInsert"
+      "org.apache.druid.sql.calcite.parser.DruidSqlInsert"
+    ]
+
+    # List of new keywords. Example: "DATABASES", "TABLES". If the keyword is 
not a reserved
+    # keyword add it to 'nonReservedKeywords' section.
+    keywords: [

Review comment:
       Yeah, probably makes sense for this to be a non-reserved keyword.

##########
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -765,13 +786,65 @@ static ParsedNodes create(final SqlNode node) throws 
ValidationException
       if (query.getKind() == SqlKind.INSERT) {
         insert = (SqlInsert) query;
         query = insert.getSource();
+
+        // Processing to be done when the original query has either of the 
PARTITION BY or CLUSTER BY clause
+        if (insert instanceof DruidSqlInsert) {

Review comment:
       When will it not be a DruidSqlInsert?

##########
File path: sql/pom.xml
##########
@@ -255,6 +260,140 @@
           </execution>
         </executions>
       </plugin>
+
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-dependency-plugin</artifactId>
+        <executions>
+          <execution>
+            <!-- Extract parser grammar template from Apache Calcite and put
+              it under ${project.build.directory} where all freemarker 
templates are. -->
+            <id>unpack-parser-template</id>
+            <phase>initialize</phase>
+            <goals>
+              <goal>unpack</goal>
+            </goals>
+            <configuration>
+              <artifactItems>
+                <artifactItem>
+                  <groupId>org.apache.calcite</groupId>
+                  <artifactId>calcite-core</artifactId>
+                  <version>${calcite.version}</version>
+                  <type>jar</type>
+                  <overWrite>true</overWrite>
+                  
<outputDirectory>${project.build.directory}/</outputDirectory>
+                  <includes>**/Parser.jj</includes>
+                </artifactItem>
+                <artifactItem>
+                  <groupId>org.apache.calcite</groupId>
+                  <artifactId>calcite-core</artifactId>
+                  <version>${calcite.version}</version>
+                  <type>jar</type>
+                  <overWrite>true</overWrite>
+                  
<outputDirectory>${project.build.directory}/</outputDirectory>
+                  <includes>**/config.fmpp</includes>
+                </artifactItem>
+              </artifactItems>
+            </configuration>
+          </execution>
+        </executions>
+      </plugin>
+
+      <plugin>
+        <groupId>com.googlecode.fmpp-maven-plugin</groupId>
+        <artifactId>fmpp-maven-plugin</artifactId>
+        <version>1.0</version>

Review comment:
       For hygiene I think it's good to define all the plugin version numbers 
in pluginManagement of the main pom. The actual configuration should stay in 
this module, since it's specific to this particular module.

##########
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -744,18 +755,28 @@ public T next()
 
     private SqlNode query;
 
-    private ParsedNodes(@Nullable SqlExplain explain, @Nullable SqlInsert 
insert, SqlNode query)
+    @Nullable
+    private String ingestionGranularity;
+
+    private ParsedNodes(
+        @Nullable SqlExplain explain,
+        @Nullable SqlInsert insert,

Review comment:
       Would it make sense to have this be a DruidSqlInsert, so the granularity 
doesn't need to be provided separately? I'm asking since my understanding is 
we'll _always_ get a DruidSqlInsert, even if the user doesn't provide PARTITION 
BY or CLUSTER BY.

##########
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.parser;
+
+import org.apache.calcite.sql.SqlInsert;
+import org.apache.calcite.sql.SqlLiteral;
+import org.apache.calcite.sql.SqlNode;
+import org.apache.calcite.sql.SqlNodeList;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.SqlWriter;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+/**
+ * Extends the Insert call to hold custom paramaters specific to druid i.e. 
PARTITION BY and CLUSTER BY

Review comment:
       - parameters (spelling)
   - Druid (capitalization)

##########
File path: sql/src/main/codegen/includes/insert.ftl
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+SqlNode DruidSqlInsert() :
+{
+ SqlNode insertNode;
+ SqlNode partitionBy = null;
+ SqlNodeList clusterBy = null;
+}
+{
+    insertNode = SqlInsert()
+    [
+      <PARTITION> <BY>
+      partitionBy = StringLiteral()

Review comment:
       IMO it'd be better for this to be a TimeUnit, like we use for time floor 
expressions such as `FLOOR(__time TO DAY)`. That way the phrase would be 
`PARTITION BY DAY` instead of `PARTITION BY 'day'`. The syntax is a little 
cleaner, and we have a `TimeUnits.toPeriod` function to help convert those to 
PeriodGranularities.
   
   The only note is we'd need to be able to say `PARTITION BY ALL` too, so the 
grammar would need to accept a TimeUnit or the ALL keyword.

##########
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -765,13 +785,53 @@ static ParsedNodes create(final SqlNode node) throws 
ValidationException
       if (query.getKind() == SqlKind.INSERT) {
         insert = (SqlInsert) query;
         query = insert.getSource();
+
+        // Processing to be done when the original query has either of the 
PARTITION BY or CLUSTER BY clause
+        if (insert instanceof DruidSqlInsert) {
+          DruidSqlInsert druidSqlInsert = (DruidSqlInsert) insert;
+
+          ingestionGranularity = druidSqlInsert.getPartitionBy();
+
+          if (druidSqlInsert.getClusterBy() != null) {
+            // If we have a CLUSTER BY clause, extract the information in that 
CLUSTER BY and create a new SqlOrderBy
+            // node
+            SqlNode offset = null;
+            SqlNode fetch = null;
+            SqlNodeList orderByList = null;
+
+            if (query instanceof SqlOrderBy) {
+              SqlOrderBy sqlOrderBy = (SqlOrderBy) query;
+              // Extract the query present inside the SqlOrderBy (which is 
free of ORDER BY, OFFSET and FETCH clauses)
+              query = sqlOrderBy.query;
+
+              offset = sqlOrderBy.offset;
+              fetch = sqlOrderBy.fetch;
+              orderByList = sqlOrderBy.orderList;
+              // If the orderList is non-empty (i.e. there existed an ORDER BY 
clause in the query) and CLUSTER BY clause
+              // is also non-empty, throw an error
+              if (!(orderByList == null || 
orderByList.equals(SqlNodeList.EMPTY))
+                  && druidSqlInsert.getClusterBy() != null) {
+                throw new ValidationException(
+                    "Cannot have both ORDER BY and CLUSTER BY clauses in the 
same INSERT query");
+              }
+            }
+            // Creates a new SqlOrderBy query, which may have our CLUSTER BY 
overwritten
+            query = new SqlOrderBy(

Review comment:
       I think it's OK for now to continue using the native ordering specs to 
represent cluster by, and to disallow having ORDER BY and CLUSTER BY at the 
same time. Also, IMO, we should _not_ allow ORDER BY on the SELECT for an 
INSERT. The error could be something like "Cannot have ORDER BY on an INSERT 
query; use CLUSTER BY instead."
   
   That way, we can keep discussing if we should allow ORDER BY and CLUSTER BY 
to be used together, and if so, what should happen. Without blocking this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm commented on a change in pull request #12163: Add syntax support for PARTITION BY/CLUSTER BY in INSERT queries

Reply via email to