Re: [PR] Extend the PARTITION BY clause to accept string literals for the time partitioning (druid)

via GitHub Mon, 05 Feb 2024 18:26:30 -0800


abhishekrb19 commented on code in PR #15836:
URL: https://github.com/apache/druid/pull/15836#discussion_r1479123192



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java:
##########
@@ -78,7 +91,7 @@ public void unparse(SqlWriter writer, int leftPrec, int 
rightPrec)
   {
     super.unparse(writer, leftPrec, rightPrec);
     writer.keyword("PARTITIONED BY");
-    writer.keyword(partitionedByStringForUnparse);
+    writer.keyword(partitionedBy.toString());

Review Comment:
   Calling `toString()` will cause a NPE if `partitionedBy` is nullable



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java:
##########
@@ -70,13 +75,76 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.Map;
 import java.util.stream.Collectors;
 
 public class DruidSqlParserUtils
 {
   private static final Logger log = new Logger(DruidSqlParserUtils.class);
   public static final String ALL = "all";
 
+  @VisibleForTesting
+  public static final String PARTITION_ERROR_MESSAGE =
+      "Invalid granularity [%s] after PARTITIONED BY.  "
+      + "Expected HOUR, DAY, MONTH, YEAR, ALL TIME, FLOOR() or TIME_FLOOR()";
+
+  public enum GranularityGrain
+  {
+    SECOND_GRAIN("SECOND", Granularities.SECOND),
+    MINUTE_GRAIN("MINUTE", Granularities.MINUTE),
+    HOUR_GRAIN("HOUR", Granularities.HOUR),
+    DAY_GRAIN("DAY", Granularities.DAY),
+    WEEK_GRAIN("WEEK", Granularities.WEEK),
+    MONTH_GRAIN("MONTH", Granularities.MONTH),

Review Comment:
   Why is this only a subset of the supported granularities? For example, 
`FIFTEEN_MINUTE`, `THIRTY_MINUTE`, etc are not defined here
   
   Also, I think it'd be best if we move this to 
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/java/util/common/granularity/Granularities.java#L28,
 so we have one place to track them all.



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java:
##########
@@ -166,7 +270,9 @@ public static Granularity 
convertSqlNodeToGranularity(SqlNode sqlNode) throws Pa
         period = new Period(granularityString);
       }
       catch (IllegalArgumentException e) {
-        throw new ParseException(StringUtils.format("%s is an invalid period 
string", granularitySqlNode.toString()));
+        throw InvalidSqlInput.exception(
+            StringUtils.format("%s is an invalid period string", 
granularitySqlNode.toString()),

Review Comment:
   ```suggestion
               StringUtils.format("granularity[%s] is an invalid period 
string", granularitySqlNode.toString()),
   ```



##########
sql/src/test/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtilsTest.java:
##########
@@ -133,6 +133,26 @@ public void testGetGranularityFromFloor() throws 
ParseException
       Granularity actualGranularity = 
DruidSqlParserUtils.convertSqlNodeToGranularityThrowingParseExceptions(floorCall);
       Assert.assertEquals(expectedGranularity, actualGranularity);
     }
+
+    /**
+     * Tests clause like "PARTITIONED BY 'day'"
+     * @throws ParseException
+     */
+    @Test
+    public void testConvertSqlNodeToGranularityAsLiteral() throws 
ParseException
+    {
+      SqlNode sqlNode = SqlLiteral.createCharString(timeUnit.name(), 
SqlParserPos.ZERO);
+      Granularity actualGranularity = 
DruidSqlParserUtils.convertSqlNodeToGranularityThrowingParseExceptions(sqlNode);
+      Assert.assertEquals(expectedGranularity, actualGranularity);
+    }
+
+    @Test

Review Comment:
   could you add a javadoc on what this test is testing?



##########
sql/src/main/codegen/includes/common.ftl:
##########
@@ -18,58 +18,52 @@
  */
 
 // Using fully qualified name for Pair class, since Calcite also has a same 
class name being used in the Parser.jj

Review Comment:
   Stale comment:
   ```suggestion
   ```



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java:
##########
@@ -70,13 +75,76 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.Map;
 import java.util.stream.Collectors;
 
 public class DruidSqlParserUtils
 {
   private static final Logger log = new Logger(DruidSqlParserUtils.class);
   public static final String ALL = "all";
 
+  @VisibleForTesting
+  public static final String PARTITION_ERROR_MESSAGE =
+      "Invalid granularity [%s] after PARTITIONED BY.  "
+      + "Expected HOUR, DAY, MONTH, YEAR, ALL TIME, FLOOR() or TIME_FLOOR()";
+
+  public enum GranularityGrain
+  {
+    SECOND_GRAIN("SECOND", Granularities.SECOND),
+    MINUTE_GRAIN("MINUTE", Granularities.MINUTE),
+    HOUR_GRAIN("HOUR", Granularities.HOUR),
+    DAY_GRAIN("DAY", Granularities.DAY),
+    WEEK_GRAIN("WEEK", Granularities.WEEK),
+    MONTH_GRAIN("MONTH", Granularities.MONTH),
+    QUARTER_GRAIN("QUARTER", Granularities.QUARTER),
+    YEAR_GRAIN("YEAR", Granularities.YEAR),
+    ALL_GRAIN("ALL", Granularities.ALL),
+    ALL_TIME_GRAIN("ALL TIME", Granularities.ALL);
+
+    static final Map<String, GranularityGrain> VALUE_TO_GRANULARITY_GRAIN = 
ImmutableMap.of(

Review Comment:
   Can we not use `Granularity.fromString()` with the supplied literal and not 
have this reverse-mapping/enum at all?



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java:
##########
@@ -70,13 +75,76 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.Map;
 import java.util.stream.Collectors;
 
 public class DruidSqlParserUtils
 {
   private static final Logger log = new Logger(DruidSqlParserUtils.class);
   public static final String ALL = "all";
 
+  @VisibleForTesting
+  public static final String PARTITION_ERROR_MESSAGE =
+      "Invalid granularity [%s] after PARTITIONED BY.  "

Review Comment:
   ```suggestion
         "Invalid granularity[%s] specified after PARTITIONED BY clause.  "
   ```



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java:
##########
@@ -70,13 +75,76 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.Map;
 import java.util.stream.Collectors;
 
 public class DruidSqlParserUtils
 {
   private static final Logger log = new Logger(DruidSqlParserUtils.class);
   public static final String ALL = "all";
 
+  @VisibleForTesting

Review Comment:
   Is this used for testing? If not, could you make this private and remove the 
annotation?



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java:
##########
@@ -96,28 +164,64 @@ public static Granularity 
convertSqlNodeToGranularityThrowingParseExceptions(Sql
   }
 
   /**
-   * This method is used to extract the granularity from a SqlNode 
representing following function calls:
-   * 1. FLOOR(__time TO TimeUnit)
-   * 2. TIME_FLOOR(__time, 'PT1H')
+   * This method is used to extract the granularity from a SqlNode which 
represents
+   * the argument to the {@code PARTITIONED BY} clause. The node can be any of 
the following:
+   * <ul>
+   * <li>A literal with a string that matches the SQL keywords
+   * {@code HOUR, DAY, MONTH, YEAR, ALL [TIME]}</li>
+   * <li>A literal string with a period in ISO 8601 format.</li>
+   * <li>Function call: {@code FLOOR(__time TO TimeUnit)}</li>
+   * <li>Function call: TIME_FLOOR(__time, 'PT1H')}</li>
+   * </ul>
    * <p>
-   * Validation on the sqlNode is contingent to following conditions:
-   * 1. sqlNode is an instance of SqlCall
-   * 2. Operator is either one of TIME_FLOOR or FLOOR
-   * 3. Number of operands in the call are 2
-   * 4. First operand is a SimpleIdentifier representing __time
-   * 5. If operator is TIME_FLOOR, the second argument is a literal, and can 
be converted to the Granularity class
-   * 6. If operator is FLOOR, the second argument is a TimeUnit, and can be 
mapped using {@link TimeUnits}
+   * Validation of the function sqlNode is contingent to following conditions:
+   * <ol>
+   * <li>sqlNode is an instance of SqlCall</li>
+   * <li>Operator is either one of TIME_FLOOR or FLOOR</li>
+   * <li>Number of operands in the call are 2</li>
+   * <li>First operand is a SimpleIdentifier representing __time</li>
+   * <li>If operator is TIME_FLOOR, the second argument is a literal, and can 
be converted to the Granularity class</li>
+   * <li>If operator is FLOOR, the second argument is a TimeUnit, and can be 
mapped using {@link TimeUnits}</li>
+   * </ol>
    * <p>
-   * Since it is to be used primarily while parsing the SqlNode, it is wrapped 
in {@code convertSqlNodeToGranularityThrowingParseExceptions}
+   * This method is called during validation, which will catch any errors. It 
is then called again
+   * during conversion, at which time we assume the node is valid.
    *
    * @param sqlNode SqlNode representing a call to a function
    *
    * @return Granularity as intended by the function call
    *
-   * @throws ParseException SqlNode cannot be converted a granularity
+   * @throws IAE SqlNode cannot be converted a granularity
    */
-  public static Granularity convertSqlNodeToGranularity(SqlNode sqlNode) 
throws ParseException
+  public static Granularity convertSqlNodeToGranularity(SqlNode sqlNode)
   {
+    if (sqlNode == null) {
+      return null;
+    }
+
+    if (sqlNode instanceof SqlLiteral) {
+      SqlLiteral literal = (SqlLiteral) sqlNode;
+      if (SqlLiteral.valueMatchesType(literal.getValue(), SqlTypeName.SYMBOL)) 
{
+        GranularityGrain value = literal.getValueAs(GranularityGrain.class);
+        if (value == null) {
+          throw new IAE(PARTITION_ERROR_MESSAGE, "NULL");

Review Comment:
   Consider `DruidException` instead - `throw InvalidInput.exception()`



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java:
##########
@@ -70,13 +75,76 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.Map;
 import java.util.stream.Collectors;
 
 public class DruidSqlParserUtils
 {
   private static final Logger log = new Logger(DruidSqlParserUtils.class);
   public static final String ALL = "all";
 
+  @VisibleForTesting
+  public static final String PARTITION_ERROR_MESSAGE =
+      "Invalid granularity [%s] after PARTITIONED BY.  "
+      + "Expected HOUR, DAY, MONTH, YEAR, ALL TIME, FLOOR() or TIME_FLOOR()";

Review Comment:
   I see this is existing code, but the "expected set" seems smaller than what 
can be supported - for example, `QUARTER` is missing.



##########
sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java:
##########
@@ -96,28 +164,64 @@ public static Granularity 
convertSqlNodeToGranularityThrowingParseExceptions(Sql
   }
 
   /**
-   * This method is used to extract the granularity from a SqlNode 
representing following function calls:
-   * 1. FLOOR(__time TO TimeUnit)
-   * 2. TIME_FLOOR(__time, 'PT1H')
+   * This method is used to extract the granularity from a SqlNode which 
represents
+   * the argument to the {@code PARTITIONED BY} clause. The node can be any of 
the following:
+   * <ul>
+   * <li>A literal with a string that matches the SQL keywords
+   * {@code HOUR, DAY, MONTH, YEAR, ALL [TIME]}</li>
+   * <li>A literal string with a period in ISO 8601 format.</li>
+   * <li>Function call: {@code FLOOR(__time TO TimeUnit)}</li>
+   * <li>Function call: TIME_FLOOR(__time, 'PT1H')}</li>
+   * </ul>
    * <p>
-   * Validation on the sqlNode is contingent to following conditions:
-   * 1. sqlNode is an instance of SqlCall
-   * 2. Operator is either one of TIME_FLOOR or FLOOR
-   * 3. Number of operands in the call are 2
-   * 4. First operand is a SimpleIdentifier representing __time
-   * 5. If operator is TIME_FLOOR, the second argument is a literal, and can 
be converted to the Granularity class
-   * 6. If operator is FLOOR, the second argument is a TimeUnit, and can be 
mapped using {@link TimeUnits}
+   * Validation of the function sqlNode is contingent to following conditions:
+   * <ol>
+   * <li>sqlNode is an instance of SqlCall</li>
+   * <li>Operator is either one of TIME_FLOOR or FLOOR</li>
+   * <li>Number of operands in the call are 2</li>
+   * <li>First operand is a SimpleIdentifier representing __time</li>
+   * <li>If operator is TIME_FLOOR, the second argument is a literal, and can 
be converted to the Granularity class</li>
+   * <li>If operator is FLOOR, the second argument is a TimeUnit, and can be 
mapped using {@link TimeUnits}</li>
+   * </ol>
    * <p>
-   * Since it is to be used primarily while parsing the SqlNode, it is wrapped 
in {@code convertSqlNodeToGranularityThrowingParseExceptions}
+   * This method is called during validation, which will catch any errors. It 
is then called again
+   * during conversion, at which time we assume the node is valid.
    *
    * @param sqlNode SqlNode representing a call to a function
    *
    * @return Granularity as intended by the function call
    *
-   * @throws ParseException SqlNode cannot be converted a granularity
+   * @throws IAE SqlNode cannot be converted a granularity
    */
-  public static Granularity convertSqlNodeToGranularity(SqlNode sqlNode) 
throws ParseException
+  public static Granularity convertSqlNodeToGranularity(SqlNode sqlNode)
   {
+    if (sqlNode == null) {
+      return null;
+    }
+
+    if (sqlNode instanceof SqlLiteral) {
+      SqlLiteral literal = (SqlLiteral) sqlNode;
+      if (SqlLiteral.valueMatchesType(literal.getValue(), SqlTypeName.SYMBOL)) 
{
+        GranularityGrain value = literal.getValueAs(GranularityGrain.class);
+        if (value == null) {
+          throw new IAE(PARTITION_ERROR_MESSAGE, "NULL");
+        }
+        return value.getGranularity();
+      } else if (SqlLiteral.valueMatchesType(literal.getValue(), 
SqlTypeName.CHAR)) {
+        String value = literal.getValueAs(String.class);
+        if (Strings.isNullOrEmpty(value)) {
+          throw new IAE(PARTITION_ERROR_MESSAGE, value == null ? "NULL" : "'" 
+ value + "'");

Review Comment:
   Ditto here and below for invalid user inputs, please throw 
`InvalidInput.exception()`



##########
sql/src/test/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtilsTest.java:
##########
@@ -133,6 +133,26 @@ public void testGetGranularityFromFloor() throws 
ParseException
       Granularity actualGranularity = 
DruidSqlParserUtils.convertSqlNodeToGranularityThrowingParseExceptions(floorCall);
       Assert.assertEquals(expectedGranularity, actualGranularity);
     }
+
+    /**
+     * Tests clause like "PARTITIONED BY 'day'"
+     * @throws ParseException

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Extend the PARTITION BY clause to accept string literals for the time partitioning (druid)

Reply via email to