clintropolis commented on a change in pull request #12078:
URL: https://github.com/apache/druid/pull/12078#discussion_r780203919
##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
##########
@@ -128,27 +128,10 @@ public static String escapeStringLiteral(final String s)
}
/**
- * Convert {@link RelDataType} to the most appropriate {@link ValueType},
coercing all ARRAY types to STRING (until
- * the time is right and we are more comfortable handling Druid ARRAY types
in all parts of the engine).
- *
- * Callers who are not scared of ARRAY types should isntead call {@link
#getValueTypeForRelDataTypeFull(RelDataType)},
- * which returns the most accurate conversion of {@link RelDataType} to
{@link ValueType}.
+ * Convert {@link RelDataType} to the most appropriate {@link ValueType}}.
*/
@Nullable
public static ColumnType getColumnTypeForRelDataType(final RelDataType type)
Review comment:
the engine has been waiting on array grouping for this coercion to go
away, though there is a missing piece I think where we need to ensure that when
grouping on a virtual column we never plan to a topn query in `DruidQuery`.
I think most of the `MV_` prefixed functions have their return type as
`VARCHAR` rather than `ARRAY`, so in theory the "documented" multi-value string
functions shouldn't actually be affected, and any that will be is a bug that we
should fix. The undocumented `ARRAY_` prefixed functions will now behave as
intended and validate/operate as true SQL arrays (and so group correctly as
arrays). If anyone was using the undocumented `ARRAY_` prefix functions, then I
guess that behavior will change, since grouping on those expressions would
previously have been grouping as `STRING` due to the implicit unnest.
The only reason I can think of to add a feature flag is if need a way to
hide bugs that this change causes, and that _should_ only happen if any of the
`MV_` functions have bugs in their SQL output types, or if people were using
the undocumented `ARRAY_` functions.
##########
File path:
sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
##########
@@ -143,54 +144,6 @@ public void
testSelectNonConstantArrayExpressionFromTable() throws Exception
);
}
- @Test
- public void testSelectNonConstantArrayExpressionFromTableForMultival()
throws Exception
Review comment:
heh, i wrote this test originally to document moderately strange current
behavior. This PR changes the behavior of `array` constructor function to no
longer "map" multi-value string inputs and instead treat them as arrays, so it
should probably be repurposed to test whatever the new behavior should be
probably
The map behavior was not very intuitive, since it would make a nested array
where the elements are single element arrays from the multi-value string, so
idk that it needs preserved.
We might want to consider making use of a multi-value string inside of the
array constructor an error, since i'm not sure there is an intuitive behavior
because it is ambiguous whether the user is acknowledging the column as a
multi-value, or doesn't know and thinks it single valued, which most of the
other functions don't have this ambiguity, though it probably doesn't need to
be changed in this PR.
##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args,
Expr.ObjectBinding bindings)
Object[] out = new Object[length];
ExpressionType arrayType = null;
+
for (int i = 0; i < length; i++) {
ExprEval<?> evaluated = args.get(i).eval(bindings);
+ // short circuit the case where you have one input argument whose
input type is array. So array function has
Review comment:
i have a concern with this change, best illustrated by the implication
that now an expression like this
```
array(array('x','y','z'))
```
will now produce `['x','y','z']` instead of `[['x','y','z']]` as one would
expect, or would happen if the expression were this instead:
```
array(array('x','y','z'),array('a','b','c'))
```
which will still produce `[['x','y','z'],['a','b','c']]`
Instead of hijacking this function to accommodate multi-value string
behavior, I think we should instead add a `MV_TO_ARRAY` function in SQL that
only accepts a string, and then in the native expression layer either just use
`cast(x, 'ARRAY<STRING>')` or also add a `mv_to_array` function that basically
does the same thing as cast. This way we are explicit on when we intend to
treat a multi-value column directly as an array
##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -2994,8 +2994,14 @@ public ExprEval apply(List<Expr> args,
Expr.ObjectBinding bindings)
Object[] out = new Object[length];
ExpressionType arrayType = null;
+
for (int i = 0; i < length; i++) {
ExprEval<?> evaluated = args.get(i).eval(bindings);
+ // short circuit the case where you have one input argument whose
input type is array. So array function has
Review comment:
i have a concern with this change, best illustrated by the implication
that now an expression like this
```
array(array('x','y','z'))
```
will now produce `['x','y','z']` instead of `[['x','y','z']]` as one would
expect, or would happen if the expression were this instead:
```
array(array('x','y','z'),array('a','b','c'))
```
which will still produce `[['x','y','z'],['a','b','c']]`
Instead of hijacking this function to accommodate multi-value string
behavior, I think we should instead add a `MV_TO_ARRAY` function in SQL that
only accepts a string, and then in the native expression layer either just use
`cast(x, 'ARRAY<STRING>')` or also add a `mv_to_array` function that basically
does the same thing as cast. This way we are explicit on when we intend to
treat a multi-value column directly as an array.
I think the changes to prevent the array constructor from being mapped for
multi-value strings are fine to leave as is though, since the mapping is sort
of strange
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]