waynexia commented on code in PR #3861: URL: https://github.com/apache/arrow-datafusion/pull/3861#discussion_r1001359583
##########
benchmarks/expected-plans/q21.txt:
##########
@@ -7,15 +7,18 @@ Sort: numwait DESC NULLS FIRST, supplier.s_name ASC NULLS LAST
Inner Join: l1.l_orderkey = orders.o_orderkey
Inner Join: supplier.s_suppkey = l1.l_suppkey
TableScan: supplier projection=[s_suppkey, s_name, s_nationkey]
- Filter: l1.l_receiptdate > l1.l_commitdate
- SubqueryAlias: l1
- TableScan: lineitem projection=[l_orderkey, l_suppkey,
l_commitdate, l_receiptdate]
- Filter: orders.o_orderstatus = Utf8("F")
- TableScan: orders projection=[o_orderkey, o_orderstatus]
- Filter: nation.n_name = Utf8("SAUDI ARABIA")
- TableScan: nation projection=[n_nationkey, n_name]
+ Filter: l1.l_receiptdate >
l1.l_commitdatel1.l_commitdatel1.l_receiptdate
+ Projection: l1.l_receiptdate > l1.l_commitdate AS
l1.l_receiptdate > l1.l_commitdatel1.l_commitdatel1.l_receiptdate,
l1.l_orderkey, l1.l_suppkey
+ SubqueryAlias: l1
+ TableScan: lineitem projection=[l_orderkey, l_suppkey,
l_commitdate, l_receiptdate]
+ Filter: orders.o_orderstatus =
Utf8("F")Utf8("F")orders.o_orderstatus
+ Projection: orders.o_orderstatus = Utf8("F") AS
orders.o_orderstatus = Utf8("F")Utf8("F")orders.o_orderstatus, orders.o_orderkey
+ TableScan: orders projection=[o_orderkey, o_orderstatus]
+ Filter: nation.n_name = Utf8("SAUDI ARABIA")Utf8("SAUDI
ARABIA")nation.n_name
+ Projection: nation.n_name = Utf8("SAUDI ARABIA") AS
nation.n_name = Utf8("SAUDI ARABIA")Utf8("SAUDI ARABIA")nation.n_name,
nation.n_nationkey
Review Comment:
I try `EXPLAIN VERBOSE` and it looks like the `FilterPushdown` rule
generates a redundant expr. Here is the log:
```plaintext
| logical_plan after reduce_cross_join | SAME TEXT AS
ABOVE
|
| logical_plan after common_sub_expression_eliminate | SAME TEXT AS
ABOVE
|
| logical_plan after eliminate_limit | SAME TEXT AS
ABOVE
|
| logical_plan after projection_push_down | Sort: numwait
DESC NULLS FIRST, supplier.s_name ASC NULLS LAST
|
| | Projection:
supplier.s_name, COUNT(UInt8(1)) AS numwait
|
| |
Aggregate: groupBy=[[supplier.s_name]], aggr=[[COUNT(UInt8(1))]]
|
| | Filter:
orders.o_orderstatus = Utf8("F") AND l1.l_receiptdate > l1.l_commitdate AND
nation.n_name = Utf8("SAUDI ARABIA")
|
| | Anti
Join: l1.l_orderkey = l3.l_orderkey Filter: l3.l_suppkey != l1.l_suppkey
|
| |
Filter: orders.o_orderstatus = Utf8("F") AND l1.l_receiptdate > l1.l_commitdate
AND nation.n_name = Utf8("SAUDI ARABIA")
|
| |
Semi Join: l1.l_orderkey = l2.l_orderkey Filter: l2.l_suppkey != l1.l_suppkey
|
| |
Inner Join: supplier.s_nationkey = nation.n_nationkey
|
| |
Inner Join: l1.l_orderkey = orders.o_orderkey
|
| |
Inner Join: supplier.s_suppkey = l1.l_suppkey
|
| |
TableScan: supplier projection=[s_suppkey, s_name, s_nationkey]
|
| |
SubqueryAlias: l1
|
| |
TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate,
l_receiptdate]
|
| |
TableScan: orders projection=[o_orderkey, o_orderstatus]
|
| |
TableScan: nation projection=[n_nationkey, n_name]
|
| |
SubqueryAlias: l2
|
| |
TableScan: lineitem projection=[l_orderkey, l_suppkey]
|
| |
Filter: l3.l_receiptdate > l3.l_commitdate
|
| |
SubqueryAlias: l3
|
| |
TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate,
l_receiptdate]
|
| logical_plan after rewrite_disjunctive_predicate | SAME TEXT AS
ABOVE
|
| logical_plan after reduce_outer_join | SAME TEXT AS
ABOVE
|
| logical_plan after filter_push_down | Sort: numwait
DESC NULLS FIRST, supplier.s_name ASC NULLS LAST
|
| | Projection:
supplier.s_name, COUNT(UInt8(1)) AS numwait
|
| |
Aggregate: groupBy=[[supplier.s_name]], aggr=[[COUNT(UInt8(1))]]
|
| | Anti
Join: l1.l_orderkey = l3.l_orderkey Filter: l3.l_suppkey != l1.l_suppkey
|
| | Semi
Join: l1.l_orderkey = l2.l_orderkey Filter: l2.l_suppkey != l1.l_suppkey
|
| |
Inner Join: supplier.s_nationkey = nation.n_nationkey
|
| |
Inner Join: l1.l_orderkey = orders.o_orderkey
|
| |
Inner Join: supplier.s_suppkey = l1.l_suppkey
|
| |
TableScan: supplier projection=[s_suppkey, s_name, s_nationkey]
|
| |
Filter: l1.l_receiptdate > l1.l_commitdate AND l1.l_receiptdate >
l1.l_commitdate
|
| |
SubqueryAlias: l1
|
| |
TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate,
l_receiptdate]
|
| |
Filter: orders.o_orderstatus = Utf8("F") AND orders.o_orderstatus = Utf8("F")
|
| |
TableScan: orders projection=[o_orderkey, o_orderstatus]
|
| |
Filter: nation.n_name = Utf8("SAUDI ARABIA") AND nation.n_name = Utf8("SAUDI
ARABIA")
|
| |
TableScan: nation projection=[n_nationkey, n_name]
|
| |
SubqueryAlias: l2
|
| |
TableScan: lineitem projection=[l_orderkey, l_suppkey]
|
| |
Filter: l3.l_receiptdate > l3.l_commitdate
|
| |
SubqueryAlias: l3
|
| |
TableScan: lineitem projection=[l_orderkey, l_suppkey, l_commitdate,
l_receiptdate]
|
| logical_plan after limit_push_down | SAME TEXT AS
ABOVE
|
| logical_plan after single_distinct_aggregation_to_group_by | SAME TEXT AS
ABOVE
|
| logical_plan after type_coercion | SAME TEXT AS
ABOVE
```
And the redundant `AND` in filter is eliminated in the following
`simplify_expressions` rule, but the projection introduced by
`CommonSubexpressionEliminate` left.
I haven't dug into it. But this is not in the scope of this PR I suppose.
I'll open an issue to track it. (the causality among rules are really
interesting😋)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
