clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780203919



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
   }
 
   /**
-   * Convert {@link RelDataType} to the most appropriate {@link ValueType}, 
coercing all ARRAY types to STRING (until
-   * the time is right and we are more comfortable handling Druid ARRAY types 
in all parts of the engine).
-   *
-   * Callers who are not scared of ARRAY types should isntead call {@link 
#getValueTypeForRelDataTypeFull(RelDataType)},
-   * which returns the most accurate conversion of {@link RelDataType} to 
{@link ValueType}.
+   * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
    */
   @Nullable
   public static ColumnType getColumnTypeForRelDataType(final RelDataType type)

Review comment:
       the engine has been waiting on array grouping for this coercion to go 
away, though there is a missing piece I think where we need to ensure that when 
grouping on a virtual column we never plan to a topn query in `DruidQuery`.
   
   I think most of the `MV_` prefixed functions have their return type as 
`VARCHAR` rather than `ARRAY`, so in theory the "documented" multi-value string 
functions shouldn't actually be affected, and any that will be is a bug that we 
should fix. The undocumented `ARRAY_` prefixed functions will now behave as 
intended and validate/operate as true SQL arrays (and so group correctly as 
arrays). If anyone was using the undocumented `ARRAY_` prefix functions, then I 
guess that behavior will change, since grouping on those expressions would 
previously have been grouping as `STRING` due to the implicit unnest.
   
   The only reason I can think of to add a feature flag is if need a way to 
hide bugs that this change causes, and that _should_ only happen if any of the 
`MV_` functions have bugs in their SQL output types, or if people were using 
the undocumented `ARRAY_` functions.

##########
File path: 
sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -143,54 +144,6 @@ public void 
testSelectNonConstantArrayExpressionFromTable() throws Exception
     );
   }
 
-  @Test
-  public void testSelectNonConstantArrayExpressionFromTableForMultival() 
throws Exception

Review comment:
       heh, i wrote this test originally to document moderately strange current 
behavior. This PR changes the behavior of `array` constructor function to no 
longer "map" multi-value string inputs and instead treat them as arrays, so it 
should probably be repurposed to test whatever the new behavior should be 
probably
   
   The map behavior was not very intuitive, since it would make a nested array 
where the elements are single element arrays from the multi-value string, so 
idk that it needs preserved.
   
   We might want to consider making use of a multi-value string inside of the 
array constructor an error, since i'm not sure there is an intuitive behavior 
because it is ambiguous whether the user is acknowledging the column as a 
multi-value, or doesn't know and thinks it single valued, which most of the 
other functions don't have this ambiguity, though it probably doesn't need to 
be changed in this PR.

##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, 
Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose 
input type is array. So array function has

Review comment:
       i have a concern with this change, best illustrated by the implication 
that now an expression like this
   
   ```
   array(array('x','y','z'))
   ```
   will now produce `['x','y','z']` instead of `[['x','y','z']]` as one would 
expect, or would happen if the expression were this instead:
   
   ```
   array(array('x','y','z'),array('a','b','c'))
   ```
   which will still produce `[['x','y','z'],['a','b','c']]`
   
   Instead of hijacking this function to accommodate multi-value string 
behavior, I think we should instead add a `MV_TO_ARRAY` function in SQL that 
only accepts a string, and then in the native expression layer either just use 
`cast(x, 'ARRAY<STRING>')` or also add a `mv_to_array` function that basically 
does the same thing as cast. This way we are explicit on when we intend to 
treat a multi-value column directly as an array 

##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args, 
Expr.ObjectBinding bindings)
       Object[] out = new Object[length];
 
       ExpressionType arrayType = null;
+
       for (int i = 0; i < length; i++) {
         ExprEval<?> evaluated = args.get(i).eval(bindings);
+        // short circuit the case where you have one input argument whose 
input type is array. So array function has

Review comment:
       i have a concern with this change, best illustrated by the implication 
that now an expression like this
   
   ```
   array(array('x','y','z'))
   ```
   will now produce `['x','y','z']` instead of `[['x','y','z']]` as one would 
expect, or would happen if the expression were this instead:
   
   ```
   array(array('x','y','z'),array('a','b','c'))
   ```
   which will still produce `[['x','y','z'],['a','b','c']]`
   
   Instead of hijacking this function to accommodate multi-value string 
behavior, I think we should instead add a `MV_TO_ARRAY` function in SQL that 
only accepts a string, and then in the native expression layer either just use 
`cast(x, 'ARRAY<STRING>')` or also add a `mv_to_array` function that basically 
does the same thing as cast. This way we are explicit on when we intend to 
treat a multi-value column directly as an array.
   
   I think the changes to prevent the array constructor from being mapped for 
multi-value strings are fine to leave as is though, since the mapping is sort 
of strange




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to