clintropolis commented on code in PR #15974:
URL: https://github.com/apache/druid/pull/15974#discussion_r1517188405


##########
sql/src/main/java/org/apache/druid/sql/calcite/expression/builtin/MultiValueStringOperatorConversions.java:
##########
@@ -458,6 +578,28 @@ boolean isAllowList()
     }
   }
 
+
+  private static List<DruidExpression> 
harmonizeNullsMvdArg0OperandList(List<DruidExpression> druidExpressions)
+  {
+    final List<DruidExpression> newArgs;
+    if (druidExpressions.get(0).isDirectColumnAccess()) {

Review Comment:
   >Do you have an example in mind of the weird stuff?
   
   Well, so today `MV_CONCAT(mvd, ARRAY['foo'])` plans to `array_concat(mvd, 
array('foo'))` which returns `null` instead of `[null, 'foo']` if the `mvd` row 
is `null` because we don't coerce those values into `[null]`.
   
   Functions like this are why we turned off the blanket coercion we were doing 
in the first place (`druid.expressions.homogenizeNullMultiValueStringArrays`) 
because (at the time at least) it felt too strange for the latter answer to be 
the behavior, and part of the reasoning for this is because when you do a scan 
query, which is the only way you can see MVDs as their entire (barring grouping 
on them with MV_TO_ARRAY) you never see the value as `[null]`, rather it is 
`null` because it also uses `getObject`.
   
   So if this was the argument and then rewritten into 
`MV_CONTAINS(MV_HAROMONIZE_NULLS(MV_CONCAT(mvd, ['foo'])), 'foo')` at the 
native expression layer it would be like `array_contains([null], 'foo')` which 
is false, instead of `array_contains(null, 'foo')` which is `null`.
   
   >How do you define "correctly" in this sentence? I would think that if you 
apply ARRAY_TO_MV to a legit array, it's more correct to harmonize the arrays 
than to not harmonize them. That makes them behave more like they would if they 
had been ingested as MVDs, which I think is the point of ARRAY_TO_MV?
   
   I guess there is no such thing as "correctly" because mvds themselves are 
completely inconsistent. I imagined `ARRAY_TO_MV` to primarily be used to get 
stuff like automatic unnest, and didn't really imagine reducing the fidelity of 
the underlying arrays to have the same deficiencies as mvds, but I suppose that 
is another way of thinking about it and be used with native stuff. It gets 
really weird doing `ARRAY_TO_MV` and then using `MV_` functions that plan to 
native expressions (which themselves _actually_ turn the mvds into 
ARRAY<STRING> so that the expressions can handle them), so it didn't occur to 
me that we would want to do coercion because at the native layer `ARRAY_TO_MV` 
is basically `cast(array as ARRAY<STRING>)`.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to