alamb commented on code in PR #17088: URL: https://github.com/apache/datafusion/pull/17088#discussion_r2265213324
########## datafusion/sqllogictest/test_files/errors.slt: ########## @@ -168,8 +168,9 @@ CREATE TABLE tab0(col0 INTEGER, col1 INTEGER, col2 INTEGER); statement ok INSERT INTO tab0 VALUES(83,0,38); -query error DataFusion error: Arrow error: Divide by zero error +query I Review Comment: This is an interesting change -- we short circuited the evaluation and now the error doesn't happen THis happens in other areas and so I think this change is consistent with other parts of DataFusion as well ########## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ########## @@ -1947,6 +1858,53 @@ fn has_common_conjunction(lhs: &Expr, rhs: &Expr) -> bool { iter_conjunction(rhs).any(|e| lhs_set.contains(&e) && !e.is_volatile()) } +fn binary_op_null_on_null(op: Operator) -> bool { Review Comment: I recommend making this a function on `Operator` so it is more discoverable -- similar to https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Operator.html#method.is_numerical_operators ########## datafusion/sqllogictest/test_files/predicates.slt: ########## @@ -777,6 +777,52 @@ physical_plan 16)--------------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 17)----------------------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/tpch-csv/part.csv]]}, projection=[p_partkey, p_brand], file_type=csv, has_header=true +# Simplification of a binary operator with a NULL value + +statement ok +create table t(x int) as values (1), (2), (3); + +query TT +EXPLAIN FORMAT INDENT SELECT x > NULL FROM t; +---- +logical_plan +01)Projection: Boolean(NULL) AS t.x > NULL +02)--TableScan: t projection=[] +physical_plan +01)ProjectionExec: expr=[NULL as t.x > NULL] +02)--DataSourceExec: partitions=1, partition_sizes=[1] + +query TT +EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x > NULL; +---- +logical_plan EmptyRelation +physical_plan EmptyExec + +query TT +EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x < 5 AND (10 * NULL < x); +---- +logical_plan +01)Filter: t.x < Int32(5) AND Boolean(NULL) +02)--TableScan: t projection=[x] +physical_plan +01)CoalesceBatchesExec: target_batch_size=8192 +02)--FilterExec: x@0 < 5 AND NULL Review Comment: Isn't `<expr> AND NULL` always `NULL` too? -- maybe a potential future optimization or I forget the boolean tristate logic rules in this case ########## datafusion/sqllogictest/test_files/predicates.slt: ########## @@ -777,6 +777,52 @@ physical_plan 16)--------------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 17)----------------------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/tpch-csv/part.csv]]}, projection=[p_partkey, p_brand], file_type=csv, has_header=true +# Simplification of a binary operator with a NULL value + +statement ok +create table t(x int) as values (1), (2), (3); + +query TT +EXPLAIN FORMAT INDENT SELECT x > NULL FROM t; +---- +logical_plan +01)Projection: Boolean(NULL) AS t.x > NULL +02)--TableScan: t projection=[] +physical_plan +01)ProjectionExec: expr=[NULL as t.x > NULL] +02)--DataSourceExec: partitions=1, partition_sizes=[1] + +query TT +EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x > NULL; +---- +logical_plan EmptyRelation +physical_plan EmptyExec + +query TT +EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x < 5 AND (10 * NULL < x); +---- +logical_plan +01)Filter: t.x < Int32(5) AND Boolean(NULL) +02)--TableScan: t projection=[x] +physical_plan +01)CoalesceBatchesExec: target_batch_size=8192 +02)--FilterExec: x@0 < 5 AND NULL +03)----DataSourceExec: partitions=1, partition_sizes=[1] + +query TT +EXPLAIN FORMAT INDENT SELECT * FROM t WHERE x < 5 OR (10 * NULL < x); +---- +logical_plan +01)Filter: t.x < Int32(5) OR Boolean(NULL) +02)--TableScan: t projection=[x] +physical_plan +01)CoalesceBatchesExec: target_batch_size=8192 +02)--FilterExec: x@0 < 5 OR NULL Review Comment: Yes, the difference between Filters and Projections for null semantics has come up before. I think currently there is no way to differentiate There is a ticket that tracks this idea too: - https://github.com/apache/datafusion/issues/6179 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org