alamb commented on code in PR #17088:
URL: https://github.com/apache/datafusion/pull/17088#discussion_r2265213324


##########
datafusion/sqllogictest/test_files/errors.slt:
##########
@@ -168,8 +168,9 @@ CREATE TABLE tab0(col0 INTEGER, col1 INTEGER, col2 INTEGER);
 statement ok
 INSERT INTO tab0 VALUES(83,0,38);
 
-query error DataFusion error: Arrow error: Divide by zero error
+query I

Review Comment:
   This is an interesting change -- we short circuited the evaluation and now 
the error doesn't happen
   
   THis happens in other areas and so I think this change is consistent with 
other parts of DataFusion as well



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1947,6 +1858,53 @@ fn has_common_conjunction(lhs: &Expr, rhs: &Expr) -> 
bool {
     iter_conjunction(rhs).any(|e| lhs_set.contains(&e) && !e.is_volatile())
 }
 
+fn binary_op_null_on_null(op: Operator) -> bool {

Review Comment:
   I recommend making this a function on `Operator` so it is more discoverable 
-- similar to 
https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Operator.html#method.is_numerical_operators



##########
datafusion/sqllogictest/test_files/predicates.slt:
##########
@@ -777,6 +777,52 @@ physical_plan
 16)--------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
 17)----------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/core/tests/tpch-csv/part.csv]]}, 
projection=[p_partkey, p_brand], file_type=csv, has_header=true
 
+# Simplification of a binary operator with a NULL value
+
+statement ok
+create table t(x int) as values (1), (2), (3);
+
+query TT
+EXPLAIN FORMAT INDENT SELECT x > NULL FROM t;
+----
+logical_plan
+01)Projection: Boolean(NULL) AS t.x > NULL
+02)--TableScan: t projection=[]
+physical_plan
+01)ProjectionExec: expr=[NULL as t.x > NULL]
+02)--DataSourceExec: partitions=1, partition_sizes=[1]
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x > NULL;
+----
+logical_plan EmptyRelation
+physical_plan EmptyExec
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x < 5 AND (10 * NULL < x);
+----
+logical_plan
+01)Filter: t.x < Int32(5) AND Boolean(NULL)
+02)--TableScan: t projection=[x]
+physical_plan
+01)CoalesceBatchesExec: target_batch_size=8192
+02)--FilterExec: x@0 < 5 AND NULL

Review Comment:
   Isn't `<expr> AND NULL`  always `NULL` too?  -- maybe a potential future 
optimization or I forget the boolean tristate logic rules in this case



##########
datafusion/sqllogictest/test_files/predicates.slt:
##########
@@ -777,6 +777,52 @@ physical_plan
 16)--------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
 17)----------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/core/tests/tpch-csv/part.csv]]}, 
projection=[p_partkey, p_brand], file_type=csv, has_header=true
 
+# Simplification of a binary operator with a NULL value
+
+statement ok
+create table t(x int) as values (1), (2), (3);
+
+query TT
+EXPLAIN FORMAT INDENT SELECT x > NULL FROM t;
+----
+logical_plan
+01)Projection: Boolean(NULL) AS t.x > NULL
+02)--TableScan: t projection=[]
+physical_plan
+01)ProjectionExec: expr=[NULL as t.x > NULL]
+02)--DataSourceExec: partitions=1, partition_sizes=[1]
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x > NULL;
+----
+logical_plan EmptyRelation
+physical_plan EmptyExec
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x < 5 AND (10 * NULL < x);
+----
+logical_plan
+01)Filter: t.x < Int32(5) AND Boolean(NULL)
+02)--TableScan: t projection=[x]
+physical_plan
+01)CoalesceBatchesExec: target_batch_size=8192
+02)--FilterExec: x@0 < 5 AND NULL
+03)----DataSourceExec: partitions=1, partition_sizes=[1]
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x < 5 OR (10 * NULL < x);
+----
+logical_plan
+01)Filter: t.x < Int32(5) OR Boolean(NULL)
+02)--TableScan: t projection=[x]
+physical_plan
+01)CoalesceBatchesExec: target_batch_size=8192
+02)--FilterExec: x@0 < 5 OR NULL

Review Comment:
   Yes, the difference between Filters and Projections for null semantics has 
come up before. I think currently there is no way to differentiate
   
   There is a ticket that tracks this idea too:
   - https://github.com/apache/datafusion/issues/6179



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to