[GitHub] [druid] LakshSingla commented on a change in pull request #12163: Add syntax support for PARTITION BY/CLUSTER BY in INSERT queries

GitBox Wed, 26 Jan 2022 13:42:29 -0800


LakshSingla commented on a change in pull request #12163:
URL: https://github.com/apache/druid/pull/12163#discussion_r792854195




##########
File path: sql/pom.xml
##########
@@ -255,6 +260,140 @@
           </execution>
         </executions>
       </plugin>
+
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-dependency-plugin</artifactId>
+        <executions>
+          <execution>
+            <!-- Extract parser grammar template from Apache Calcite and put
+              it under ${project.build.directory} where all freemarker 
templates are. -->
+            <id>unpack-parser-template</id>
+            <phase>initialize</phase>
+            <goals>
+              <goal>unpack</goal>
+            </goals>
+            <configuration>
+              <artifactItems>
+                <artifactItem>
+                  <groupId>org.apache.calcite</groupId>
+                  <artifactId>calcite-core</artifactId>
+                  <version>${calcite.version}</version>
+                  <type>jar</type>
+                  <overWrite>true</overWrite>
+                  
<outputDirectory>${project.build.directory}/</outputDirectory>
+                  <includes>**/Parser.jj</includes>
+                </artifactItem>
+                <artifactItem>
+                  <groupId>org.apache.calcite</groupId>
+                  <artifactId>calcite-core</artifactId>
+                  <version>${calcite.version}</version>
+                  <type>jar</type>
+                  <overWrite>true</overWrite>
+                  
<outputDirectory>${project.build.directory}/</outputDirectory>
+                  <includes>**/config.fmpp</includes>
+                </artifactItem>
+              </artifactItems>
+            </configuration>
+          </execution>
+        </executions>
+      </plugin>
+
+      <plugin>
+        <groupId>com.googlecode.fmpp-maven-plugin</groupId>
+        <artifactId>fmpp-maven-plugin</artifactId>
+        <version>1.0</version>

Review comment:
       Hey, I searched the usages and it is a one-off plugin only used here. 
That's why refraining from pushing to parent pom. WDYT? 

##########
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -765,13 +785,53 @@ static ParsedNodes create(final SqlNode node) throws 
ValidationException
       if (query.getKind() == SqlKind.INSERT) {
         insert = (SqlInsert) query;
         query = insert.getSource();
+
+        // Processing to be done when the original query has either of the 
PARTITION BY or CLUSTER BY clause
+        if (insert instanceof DruidSqlInsert) {
+          DruidSqlInsert druidSqlInsert = (DruidSqlInsert) insert;
+
+          ingestionGranularity = druidSqlInsert.getPartitionBy();
+
+          if (druidSqlInsert.getClusterBy() != null) {
+            // If we have a CLUSTER BY clause, extract the information in that 
CLUSTER BY and create a new SqlOrderBy
+            // node
+            SqlNode offset = null;
+            SqlNode fetch = null;
+            SqlNodeList orderByList = null;
+
+            if (query instanceof SqlOrderBy) {
+              SqlOrderBy sqlOrderBy = (SqlOrderBy) query;
+              // Extract the query present inside the SqlOrderBy (which is 
free of ORDER BY, OFFSET and FETCH clauses)

Review comment:
       Let me know if it is clearer after the update

##########
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -765,13 +785,53 @@ static ParsedNodes create(final SqlNode node) throws 
ValidationException
       if (query.getKind() == SqlKind.INSERT) {
         insert = (SqlInsert) query;
         query = insert.getSource();
+
+        // Processing to be done when the original query has either of the 
PARTITION BY or CLUSTER BY clause
+        if (insert instanceof DruidSqlInsert) {
+          DruidSqlInsert druidSqlInsert = (DruidSqlInsert) insert;
+
+          ingestionGranularity = druidSqlInsert.getPartitionBy();
+
+          if (druidSqlInsert.getClusterBy() != null) {
+            // If we have a CLUSTER BY clause, extract the information in that 
CLUSTER BY and create a new SqlOrderBy
+            // node
+            SqlNode offset = null;
+            SqlNode fetch = null;
+            SqlNodeList orderByList = null;
+
+            if (query instanceof SqlOrderBy) {
+              SqlOrderBy sqlOrderBy = (SqlOrderBy) query;
+              // Extract the query present inside the SqlOrderBy (which is 
free of ORDER BY, OFFSET and FETCH clauses)
+              query = sqlOrderBy.query;
+
+              offset = sqlOrderBy.offset;
+              fetch = sqlOrderBy.fetch;
+              orderByList = sqlOrderBy.orderList;
+              // If the orderList is non-empty (i.e. there existed an ORDER BY 
clause in the query) and CLUSTER BY clause
+              // is also non-empty, throw an error
+              if (!(orderByList == null || 
orderByList.equals(SqlNodeList.EMPTY))
+                  && druidSqlInsert.getClusterBy() != null) {
+                throw new ValidationException(
+                    "Cannot have both ORDER BY and CLUSTER BY clauses in the 
same INSERT query");
+              }
+            }
+            // Creates a new SqlOrderBy query, which may have our CLUSTER BY 
overwritten
+            query = new SqlOrderBy(

Review comment:
       Currently, whatever you do enter in the "OrderBy" is getting converted 
as an ordering spec. I think the aim of the clause is to add syntactic sugar 
rather than add a different semantic.
   (I am unfamiliar with Hive) I looked up Hive, which implements both of these 
clauses 
[stackoverflow](https://stackoverflow.com/questions/13715044/hive-cluster-by-vs-order-by-vs-sort-by),
 and it seems that both of them try and achieve similar functionality albeit 
with different results and different semantics. 
   But yeah then it makes sense to implement both the CLUSTER BY and ORDER BY 
clauses and punt the actual logic to QueryMaker or a DruidRel. 
   The trouble I see implementing it is 
   a) We will need to convert it to a RelNode in SqlToRelConverter, and then 
pass it down the query stack. (This is because `CLUSTER BY` can be a reference 
to a column in the statement, and we probably need to use it). I am unsure if 
this is possible without making changes to the SqlToRelConverter itself. 
   b) A new Query of type Query.INSERT would need to be created to accommodate 
"ClusteringSpec" in addition to the "OrderingSpec". 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] LakshSingla commented on a change in pull request #12163: Add syntax support for PARTITION BY/CLUSTER BY in INSERT queries

Reply via email to