Re: [PR] fix issues with array_contains and array_overlap with null left side arguments (druid)

via GitHub Fri, 01 Mar 2024 12:48:59 -0800


gianm commented on code in PR #15974:
URL: https://github.com/apache/druid/pull/15974#discussion_r1509511061



##########
processing/src/main/java/org/apache/druid/math/expr/Function.java:
##########
@@ -3779,15 +3781,64 @@ public ExpressionType 
getOutputType(Expr.InputBindingInspector inspector, List<E
     }
 
     @Override
-    ExprEval doApply(ExprEval lhsExpr, ExprEval rhsExpr)
+    public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
     {
+      final ExprEval lhsExpr = args.get(0).eval(bindings);
+      final ExprEval rhsExpr = args.get(1).eval(bindings);
+
       final Object[] array1 = lhsExpr.asArray();
-      final Object[] array2 = rhsExpr.asArray();
-      return 
ExprEval.ofLongBoolean(Arrays.asList(array1).containsAll(Arrays.asList(array2)));
+      if (array1 == null) {
+        return ExprEval.ofLong(null);
+      }
+      ExpressionType array1Type = lhsExpr.asArrayType();
+
+      if (rhsExpr.isArray()) {
+        final Object[] array2 = rhsExpr.asArray();
+
+        if (array2 == null) {
+          return ExprEval.ofLongBoolean(false);
+        }
+        final Set<Object> set = array1Type.isPrimitiveArray()

Review Comment:
   ArrayContains should be an ExprMacro, since the rhs expr is often literal, 
and we don't want to be rebuilding a set for each row (building sets isn't 
super-cheap).



##########
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidRexExecutor.java:
##########
@@ -83,7 +83,11 @@ public void reduce(
         final RexNode literal;
 
         if (sqlTypeName == SqlTypeName.BOOLEAN) {
-          literal = rexBuilder.makeLiteral(exprResult.asBoolean(), 
constExp.getType(), true);
+          if (exprResult.valueOrDefault() == null) {

Review Comment:
   Does this need to check whether we're in 2VL mode and make a FALSE literal 
instead of NULL literal in that mode? Or in 2VL mode checked somewhere else?



##########
processing/src/main/java/org/apache/druid/math/expr/Function.java:
##########
@@ -3803,16 +3854,56 @@ public ExpressionType 
getOutputType(Expr.InputBindingInspector inspector, List<E
     }
 
     @Override
-    ExprEval doApply(ExprEval lhsExpr, ExprEval rhsExpr)
+    public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
     {
-      final Object[] array1 = lhsExpr.asArray();
-      final List<Object> array2 = Arrays.asList(rhsExpr.asArray());
+      final ExprEval arrayExpr1 = args.get(0).eval(bindings);
+      final ExprEval arrayExpr2 = args.get(1).eval(bindings);
+
+      final Object[] array1 = arrayExpr1.asArray();
+      final Object[] array2 = arrayExpr2.asArray();
+      if (array1 == null) {
+        return ExprEval.ofLong(null);
+      }
+      if (array2 == null) {
+        return ExprEval.ofLong(null);
+      }
       boolean any = false;
-      for (Object check : array1) {
-        any |= array2.contains(check);
+      ExpressionType array1Type = arrayExpr1.asArrayType();
+      final Set<Object> set = array1Type.isPrimitiveArray()

Review Comment:
   Same comment about making this an ExprMacro.



##########
processing/src/main/java/org/apache/druid/math/expr/Function.java:
##########
@@ -3803,16 +3854,56 @@ public ExpressionType 
getOutputType(Expr.InputBindingInspector inspector, List<E
     }
 
     @Override
-    ExprEval doApply(ExprEval lhsExpr, ExprEval rhsExpr)
+    public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
     {
-      final Object[] array1 = lhsExpr.asArray();
-      final List<Object> array2 = Arrays.asList(rhsExpr.asArray());
+      final ExprEval arrayExpr1 = args.get(0).eval(bindings);
+      final ExprEval arrayExpr2 = args.get(1).eval(bindings);
+
+      final Object[] array1 = arrayExpr1.asArray();
+      final Object[] array2 = arrayExpr2.asArray();
+      if (array1 == null) {
+        return ExprEval.ofLong(null);

Review Comment:
   This makes me wonder what happens when the rhs array includes null.
   
   Consider `MV_CONTAINS(x, ARRAY[NULL, 'abc', 'def'])`, when `x` is a MVD 
where one row contains a single `NULL`. Is that equivalent to `x IS NULL OR 
MV_CONTAINS(x, ARRAY['abc', 'def'])` (which returns `TRUE`) or is it equivalent 
to `MV_CONTAINS(x, ARRAY['abc', 'def'])`?
   
   For that matter, consider `MV_CONTAINS(x, ARRAY['abc', 'def'])`. Does it 
return `FALSE` as if `x` was treated like `ARRAY[NULL]` (a nonnull array 
containing null)? Or does it return `UNKNOWN` as if `x` was `NULL` itself?
   
   Do the answers to these questions depend on how the `MV_CONTAINS` is 
planned— whether it ends up as an `array_contains` or a filter or something 
else?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix issues with array_contains and array_overlap with null left side arguments (druid)

Reply via email to