alamb commented on code in PR #17088:
URL: https://github.com/apache/datafusion/pull/17088#discussion_r2265213324
##########
datafusion/sqllogictest/test_files/errors.slt:
##########
@@ -168,8 +168,9 @@ CREATE TABLE tab0(col0 INTEGER, col1 INTEGER, col2 INTEGER);
statement ok
INSERT INTO tab0 VALUES(83,0,38);
-query error DataFusion error: Arrow error: Divide by zero error
+query I
Review Comment:
This is an interesting change -- we short circuited the evaluation and now
the error doesn't happen
THis happens in other areas and so I think this change is consistent with
other parts of DataFusion as well
##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1947,6 +1858,53 @@ fn has_common_conjunction(lhs: &Expr, rhs: &Expr) ->
bool {
iter_conjunction(rhs).any(|e| lhs_set.contains(&e) && !e.is_volatile())
}
+fn binary_op_null_on_null(op: Operator) -> bool {
Review Comment:
I recommend making this a function on `Operator` so it is more discoverable
-- similar to
https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Operator.html#method.is_numerical_operators
##########
datafusion/sqllogictest/test_files/predicates.slt:
##########
@@ -777,6 +777,52 @@ physical_plan
16)--------------------RepartitionExec: partitioning=RoundRobinBatch(4),
input_partitions=1
17)----------------------DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/core/tests/tpch-csv/part.csv]]},
projection=[p_partkey, p_brand], file_type=csv, has_header=true
+# Simplification of a binary operator with a NULL value
+
+statement ok
+create table t(x int) as values (1), (2), (3);
+
+query TT
+EXPLAIN FORMAT INDENT SELECT x > NULL FROM t;
+----
+logical_plan
+01)Projection: Boolean(NULL) AS t.x > NULL
+02)--TableScan: t projection=[]
+physical_plan
+01)ProjectionExec: expr=[NULL as t.x > NULL]
+02)--DataSourceExec: partitions=1, partition_sizes=[1]
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x > NULL;
+----
+logical_plan EmptyRelation
+physical_plan EmptyExec
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x < 5 AND (10 * NULL < x);
+----
+logical_plan
+01)Filter: t.x < Int32(5) AND Boolean(NULL)
+02)--TableScan: t projection=[x]
+physical_plan
+01)CoalesceBatchesExec: target_batch_size=8192
+02)--FilterExec: x@0 < 5 AND NULL
Review Comment:
Isn't `<expr> AND NULL` always `NULL` too? -- maybe a potential future
optimization or I forget the boolean tristate logic rules in this case
##########
datafusion/sqllogictest/test_files/predicates.slt:
##########
@@ -777,6 +777,52 @@ physical_plan
16)--------------------RepartitionExec: partitioning=RoundRobinBatch(4),
input_partitions=1
17)----------------------DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/core/tests/tpch-csv/part.csv]]},
projection=[p_partkey, p_brand], file_type=csv, has_header=true
+# Simplification of a binary operator with a NULL value
+
+statement ok
+create table t(x int) as values (1), (2), (3);
+
+query TT
+EXPLAIN FORMAT INDENT SELECT x > NULL FROM t;
+----
+logical_plan
+01)Projection: Boolean(NULL) AS t.x > NULL
+02)--TableScan: t projection=[]
+physical_plan
+01)ProjectionExec: expr=[NULL as t.x > NULL]
+02)--DataSourceExec: partitions=1, partition_sizes=[1]
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x > NULL;
+----
+logical_plan EmptyRelation
+physical_plan EmptyExec
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x < 5 AND (10 * NULL < x);
+----
+logical_plan
+01)Filter: t.x < Int32(5) AND Boolean(NULL)
+02)--TableScan: t projection=[x]
+physical_plan
+01)CoalesceBatchesExec: target_batch_size=8192
+02)--FilterExec: x@0 < 5 AND NULL
+03)----DataSourceExec: partitions=1, partition_sizes=[1]
+
+query TT
+EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x < 5 OR (10 * NULL < x);
+----
+logical_plan
+01)Filter: t.x < Int32(5) OR Boolean(NULL)
+02)--TableScan: t projection=[x]
+physical_plan
+01)CoalesceBatchesExec: target_batch_size=8192
+02)--FilterExec: x@0 < 5 OR NULL
Review Comment:
Yes, the difference between Filters and Projections for null semantics has
come up before. I think currently there is no way to differentiate
There is a ticket that tracks this idea too:
- https://github.com/apache/datafusion/issues/6179
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]