clintropolis commented on code in PR #15974:
URL: https://github.com/apache/druid/pull/15974#discussion_r1517188405
##########
sql/src/main/java/org/apache/druid/sql/calcite/expression/builtin/MultiValueStringOperatorConversions.java:
##########
@@ -458,6 +578,28 @@ boolean isAllowList()
}
}
+
+ private static List<DruidExpression>
harmonizeNullsMvdArg0OperandList(List<DruidExpression> druidExpressions)
+ {
+ final List<DruidExpression> newArgs;
+ if (druidExpressions.get(0).isDirectColumnAccess()) {
Review Comment:
>Do you have an example in mind of the weird stuff?
Well, so today `MV_CONCAT(mvd, ARRAY['foo'])` plans to `array_concat(mvd,
array('foo'))` which returns `null` instead of `[null, 'foo']` if the `mvd` row
is `null` because we don't coerce those values into `[null]`.
Functions like this are why we turned off the blanket coercion we were doing
in the first place (`druid.expressions.homogenizeNullMultiValueStringArrays`)
because (at the time at least) it felt too strange for the latter answer to be
the behavior, and part of the reasoning for this is because when you do a scan
query, which is the only way you can see MVDs as their entire (barring grouping
on them with MV_TO_ARRAY) you never see the value as `[null]`, rather it is
`null` because it also uses `getObject`.
So if this was the argument and then rewritten into
`MV_CONTAINS(MV_HAROMONIZE_NULLS(MV_CONCAT(mvd, ['foo'])), 'foo')` at the
native expression layer it would be like `array_contains([null], 'foo')` which
is false, instead of `array_contains(null, 'foo')` which is `null`.
>How do you define "correctly" in this sentence? I would think that if you
apply ARRAY_TO_MV to a legit array, it's more correct to harmonize the arrays
than to not harmonize them. That makes them behave more like they would if they
had been ingested as MVDs, which I think is the point of ARRAY_TO_MV?
I guess there is no such thing as "correctly" because mvds themselves are
completely inconsistent. I imagined `ARRAY_TO_MV` to primarily be used to get
stuff like automatic unnest and other native query mvd behaviors (outside of
expressions), and didn't really imagine reducing the fidelity of the underlying
arrays to have the same deficiencies as mvds, but I suppose that is another way
of thinking about it and be used with native stuff. It gets really weird doing
`ARRAY_TO_MV` and then using `MV_` functions that plan to native expressions
(which themselves _actually_ turn the mvds into ARRAY<STRING> so that the
expressions can handle them), so it didn't occur to me that we would want to do
coercion because at the native layer `ARRAY_TO_MV` is basically `cast(array as
ARRAY<STRING>)`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]