gianm commented on a change in pull request #12163:
URL: https://github.com/apache/druid/pull/12163#discussion_r793121059
##########
File path: processing/src/main/java/org/apache/druid/query/QueryContexts.java
##########
@@ -68,6 +68,7 @@
public static final String ENABLE_DEBUG = "debug";
public static final String BY_SEGMENT_KEY = "bySegment";
public static final String BROKER_SERVICE_NAME = "brokerService";
+ public static final String INGESTION_GRANULARITY = "ingestionGranularity";
Review comment:
It's weird for this to be in QueryContexts, since it's not something
that the native query engines handle. I think it'd make more sense in the sql
module. Maybe put it in the DruidSqlInsert file?
Also, imo, a better name would be `sqlInsertSegmentGranularity`. It's more
explicit, and has "sql" in the name, which is conventional for context
parameters that only apply to SQL.
##########
File path: pom.xml
##########
@@ -82,6 +82,8 @@
<apache.ranger.gson.version>2.2.4</apache.ranger.gson.version>
<avatica.version>1.17.0</avatica.version>
<avro.version>1.9.2</avro.version>
+ <!-- sql/src/main/codegen/config.fmpp is version dependent. Read the
top level comments mentioned there when
+ upgrading the Calcite's version (specifically calcite-core's
version) -->
Review comment:
This comment should be more explicit, so people really don't miss it.
Something like:
```
<!-- sql/src/main/codegen/config.fmpp is based on a file from calcite-core,
and needs to be
updated when upgrading Calcite. Refer to the top-level comments in that
file for details. -->
```
##########
File path: sql/pom.xml
##########
@@ -180,6 +180,11 @@
<artifactId>validation-api</artifactId>
<scope>provided</scope>
</dependency>
+ <dependency>
+ <groupId>org.slf4j</groupId>
+ <artifactId>slf4j-api</artifactId>
+ <version>1.7.25</version>
Review comment:
We shouldn't use different versions of slf4j-api in the sql module vs.
the rest of the modules: it should be `<scope>provided</scope>` with no version
specified. That way, we'll only ship one copy of the jar.
If you do that change, does it work out OK or does something bad happen?
##########
File path:
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.parser;
+
+import org.apache.calcite.sql.SqlInsert;
+import org.apache.calcite.sql.SqlLiteral;
+import org.apache.calcite.sql.SqlNode;
+import org.apache.calcite.sql.SqlNodeList;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.SqlWriter;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+/**
+ * Extends the Insert call to hold custom paramaters specific to druid i.e.
PARTITION BY and CLUSTER BY
+ * This class extends the {@link SqlInsert} so that this SqlNode can be used in
+ * {@link org.apache.calcite.sql2rel.SqlToRelConverter} for getting converted
into RelNode, and further processing
+ */
+public class DruidSqlInsert extends SqlInsert
+{
+ // Unsure if this should be kept as is, but this allows reusing super.unparse
Review comment:
It's just `new SqlSpecialOperator("INSERT", SqlKind.INSERT)` so I think
it's fine. Could you please update the comment to be less scary & uncertain
sounding? Like:
```java
// Allows reusing super.unparse.
```
##########
File path: sql/src/main/codegen/config.fmpp
##########
@@ -0,0 +1,433 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This file is an FMPP (http://fmpp.sourceforge.net/) configuration file to
+# allow clients to extend Calcite's SQL parser to support application specific
+# SQL statements, literals or data types.
+#
+# Calcite's parser grammar file (Parser.jj) is written in javacc
+# (http://javacc.java.net/) with Freemarker (http://freemarker.org/) variables
+# to allow clients to:
+# 1. have custom parser implementation class and package name.
+# 2. insert new parser method implementations written in javacc to parse
+# custom:
+# a) SQL statements.
+# b) literals.
+# c) data types.
+# 3. add new keywords to support custom SQL constructs added as part of (2).
+# 4. add import statements needed by inserted custom parser implementations.
+#
+# Parser template file (Parser.jj) along with this file are packaged as
+# part of the calcite-core-<version>.jar under "codegen" directory.
+
+data: {
+ parser: {
+ # Generated parser implementation package and class name.
+ package: "org.apache.druid.sql.calcite.parser",
+ class: "DruidSqlParserImpl",
+
+ # List of additional classes and packages to import.
+ # Example. "org.apache.calcite.sql.*", "java.util.List".
+ imports: [
+ "org.apache.calcite.sql.SqlNode"
+ "org.apache.calcite.sql.SqlInsert"
+ "org.apache.druid.sql.calcite.parser.DruidSqlInsert"
+ ]
+
+ # List of new keywords. Example: "DATABASES", "TABLES". If the keyword is
not a reserved
+ # keyword add it to 'nonReservedKeywords' section.
+ keywords: [
Review comment:
Yeah, probably makes sense for this to be a non-reserved keyword.
##########
File path:
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -765,13 +786,65 @@ static ParsedNodes create(final SqlNode node) throws
ValidationException
if (query.getKind() == SqlKind.INSERT) {
insert = (SqlInsert) query;
query = insert.getSource();
+
+ // Processing to be done when the original query has either of the
PARTITION BY or CLUSTER BY clause
+ if (insert instanceof DruidSqlInsert) {
Review comment:
When will it not be a DruidSqlInsert?
##########
File path: sql/pom.xml
##########
@@ -255,6 +260,140 @@
</execution>
</executions>
</plugin>
+
+ <plugin>
+ <groupId>org.apache.maven.plugins</groupId>
+ <artifactId>maven-dependency-plugin</artifactId>
+ <executions>
+ <execution>
+ <!-- Extract parser grammar template from Apache Calcite and put
+ it under ${project.build.directory} where all freemarker
templates are. -->
+ <id>unpack-parser-template</id>
+ <phase>initialize</phase>
+ <goals>
+ <goal>unpack</goal>
+ </goals>
+ <configuration>
+ <artifactItems>
+ <artifactItem>
+ <groupId>org.apache.calcite</groupId>
+ <artifactId>calcite-core</artifactId>
+ <version>${calcite.version}</version>
+ <type>jar</type>
+ <overWrite>true</overWrite>
+
<outputDirectory>${project.build.directory}/</outputDirectory>
+ <includes>**/Parser.jj</includes>
+ </artifactItem>
+ <artifactItem>
+ <groupId>org.apache.calcite</groupId>
+ <artifactId>calcite-core</artifactId>
+ <version>${calcite.version}</version>
+ <type>jar</type>
+ <overWrite>true</overWrite>
+
<outputDirectory>${project.build.directory}/</outputDirectory>
+ <includes>**/config.fmpp</includes>
+ </artifactItem>
+ </artifactItems>
+ </configuration>
+ </execution>
+ </executions>
+ </plugin>
+
+ <plugin>
+ <groupId>com.googlecode.fmpp-maven-plugin</groupId>
+ <artifactId>fmpp-maven-plugin</artifactId>
+ <version>1.0</version>
Review comment:
For hygiene I think it's good to define all the plugin version numbers
in pluginManagement of the main pom. The actual configuration should stay in
this module, since it's specific to this particular module.
##########
File path:
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -744,18 +755,28 @@ public T next()
private SqlNode query;
- private ParsedNodes(@Nullable SqlExplain explain, @Nullable SqlInsert
insert, SqlNode query)
+ @Nullable
+ private String ingestionGranularity;
+
+ private ParsedNodes(
+ @Nullable SqlExplain explain,
+ @Nullable SqlInsert insert,
Review comment:
Would it make sense to have this be a DruidSqlInsert, so the granularity
doesn't need to be provided separately? I'm asking since my understanding is
we'll _always_ get a DruidSqlInsert, even if the user doesn't provide PARTITION
BY or CLUSTER BY.
##########
File path:
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.parser;
+
+import org.apache.calcite.sql.SqlInsert;
+import org.apache.calcite.sql.SqlLiteral;
+import org.apache.calcite.sql.SqlNode;
+import org.apache.calcite.sql.SqlNodeList;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.SqlWriter;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+/**
+ * Extends the Insert call to hold custom paramaters specific to druid i.e.
PARTITION BY and CLUSTER BY
Review comment:
- parameters (spelling)
- Druid (capitalization)
##########
File path: sql/src/main/codegen/includes/insert.ftl
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+SqlNode DruidSqlInsert() :
+{
+ SqlNode insertNode;
+ SqlNode partitionBy = null;
+ SqlNodeList clusterBy = null;
+}
+{
+ insertNode = SqlInsert()
+ [
+ <PARTITION> <BY>
+ partitionBy = StringLiteral()
Review comment:
IMO it'd be better for this to be a TimeUnit, like we use for time floor
expressions such as `FLOOR(__time TO DAY)`. That way the phrase would be
`PARTITION BY DAY` instead of `PARTITION BY 'day'`. The syntax is a little
cleaner, and we have a `TimeUnits.toPeriod` function to help convert those to
PeriodGranularities.
The only note is we'd need to be able to say `PARTITION BY ALL` too, so the
grammar would need to accept a TimeUnit or the ALL keyword.
##########
File path:
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -765,13 +785,53 @@ static ParsedNodes create(final SqlNode node) throws
ValidationException
if (query.getKind() == SqlKind.INSERT) {
insert = (SqlInsert) query;
query = insert.getSource();
+
+ // Processing to be done when the original query has either of the
PARTITION BY or CLUSTER BY clause
+ if (insert instanceof DruidSqlInsert) {
+ DruidSqlInsert druidSqlInsert = (DruidSqlInsert) insert;
+
+ ingestionGranularity = druidSqlInsert.getPartitionBy();
+
+ if (druidSqlInsert.getClusterBy() != null) {
+ // If we have a CLUSTER BY clause, extract the information in that
CLUSTER BY and create a new SqlOrderBy
+ // node
+ SqlNode offset = null;
+ SqlNode fetch = null;
+ SqlNodeList orderByList = null;
+
+ if (query instanceof SqlOrderBy) {
+ SqlOrderBy sqlOrderBy = (SqlOrderBy) query;
+ // Extract the query present inside the SqlOrderBy (which is
free of ORDER BY, OFFSET and FETCH clauses)
+ query = sqlOrderBy.query;
+
+ offset = sqlOrderBy.offset;
+ fetch = sqlOrderBy.fetch;
+ orderByList = sqlOrderBy.orderList;
+ // If the orderList is non-empty (i.e. there existed an ORDER BY
clause in the query) and CLUSTER BY clause
+ // is also non-empty, throw an error
+ if (!(orderByList == null ||
orderByList.equals(SqlNodeList.EMPTY))
+ && druidSqlInsert.getClusterBy() != null) {
+ throw new ValidationException(
+ "Cannot have both ORDER BY and CLUSTER BY clauses in the
same INSERT query");
+ }
+ }
+ // Creates a new SqlOrderBy query, which may have our CLUSTER BY
overwritten
+ query = new SqlOrderBy(
Review comment:
I think it's OK for now to continue using the native ordering specs to
represent cluster by, and to disallow having ORDER BY and CLUSTER BY at the
same time. Also, IMO, we should _not_ allow ORDER BY on the SELECT for an
INSERT. The error could be something like "Cannot have ORDER BY on an INSERT
query; use CLUSTER BY instead."
That way, we can keep discussing if we should allow ORDER BY and CLUSTER BY
to be used together, and if so, what should happen. Without blocking this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]