alamb commented on code in PR #3578:
URL: https://github.com/apache/arrow-datafusion/pull/3578#discussion_r996293541
##########
benchmarks/expected-plans/q7.txt:
##########
@@ -14,7 +14,9 @@ Sort: shipping.supp_nation ASC NULLS LAST,
shipping.cust_nation ASC NULLS LAST,
TableScan: lineitem projection=[l_orderkey, l_suppkey,
l_extendedprice, l_discount, l_shipdate]
TableScan: orders projection=[o_orderkey, o_custkey]
TableScan: customer projection=[c_custkey, c_nationkey]
- SubqueryAlias: n1
+ Filter: n1.n_name = Utf8("FRANCE") OR n1.n_name =
Utf8("GERMANY")
Review Comment:
👍 these should be fine, though I expect the performance to be negligible as
the number of rows in `nation` are very small
##########
benchmarks/expected-plans/q19.txt:
##########
@@ -3,7 +3,7 @@ Projection: SUM(lineitem.l_extendedprice * Int64(1) -
lineitem.l_discount) AS re
Projection: lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON") AS
lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON")Utf8("DELIVER IN
PERSON")lineitem.l_shipinstruct, lineitem.l_shipmode IN ([Utf8("AIR"),
Utf8("AIR REG")]) AS lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR
REG")])Utf8("AIR REG")Utf8("AIR")lineitem.l_shipmode, part.p_size >= Int32(1)
AS part.p_size >= Int32(1)Int32(1)part.p_size, lineitem.l_quantity,
lineitem.l_extendedprice, lineitem.l_discount, part.p_brand, part.p_size,
part.p_container
Filter: part.p_brand = Utf8("Brand#12") AND part.p_container IN
([Utf8("SM CASE"), Utf8("SM BOX"), Utf8("SM PACK"), Utf8("SM PKG")]) AND
lineitem.l_quantity >= Decimal128(Some(100),15,2) AND lineitem.l_quantity <=
Decimal128(Some(1100),15,2) AND part.p_size <= Int32(5) OR part.p_brand =
Utf8("Brand#23") AND part.p_container IN ([Utf8("MED BAG"), Utf8("MED BOX"),
Utf8("MED PKG"), Utf8("MED PACK")]) AND lineitem.l_quantity >=
Decimal128(Some(1000),15,2) AND lineitem.l_quantity <=
Decimal128(Some(2000),15,2) AND part.p_size <= Int32(10) OR part.p_brand =
Utf8("Brand#34") AND part.p_container IN ([Utf8("LG CASE"), Utf8("LG BOX"),
Utf8("LG PACK"), Utf8("LG PKG")]) AND lineitem.l_quantity >=
Decimal128(Some(2000),15,2) AND lineitem.l_quantity <=
Decimal128(Some(3000),15,2) AND part.p_size <= Int32(15)
Inner Join: lineitem.l_partkey = part.p_partkey
- Filter: lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")]) AND
lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON")
+ Filter: lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")]) AND
lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON") AND lineitem.l_quantity >=
Decimal128(Some(100),15,2) AND lineitem.l_quantity <=
Decimal128(Some(1100),15,2) OR lineitem.l_quantity >=
Decimal128(Some(1000),15,2) AND lineitem.l_quantity <=
Decimal128(Some(2000),15,2) OR lineitem.l_quantity >=
Decimal128(Some(2000),15,2) AND lineitem.l_quantity <=
Decimal128(Some(3000),15,2)
Review Comment:
These are probably good filters to have added as long as they are selective,
though they may be expensive to evaluate because they will be evaluated against
*all* rows in lineitem rather than only those rows which were not filtered out
by the join with part.
To make this super fast, maybe could use "sideways information passing" to
push down the `part` filtering into the lineitem scan as well
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]