alamb commented on code in PR #3578:
URL: https://github.com/apache/arrow-datafusion/pull/3578#discussion_r996293541


##########
benchmarks/expected-plans/q7.txt:
##########
@@ -14,7 +14,9 @@ Sort: shipping.supp_nation ASC NULLS LAST, 
shipping.cust_nation ASC NULLS LAST,
                         TableScan: lineitem projection=[l_orderkey, l_suppkey, 
l_extendedprice, l_discount, l_shipdate]
                     TableScan: orders projection=[o_orderkey, o_custkey]
                   TableScan: customer projection=[c_custkey, c_nationkey]
-                SubqueryAlias: n1
+                Filter: n1.n_name = Utf8("FRANCE") OR n1.n_name = 
Utf8("GERMANY")

Review Comment:
   👍  these should be fine, though I expect the performance to be negligible as 
the number of rows in `nation` are very small



##########
benchmarks/expected-plans/q19.txt:
##########
@@ -3,7 +3,7 @@ Projection: SUM(lineitem.l_extendedprice * Int64(1) - 
lineitem.l_discount) AS re
     Projection: lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON") AS 
lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON")Utf8("DELIVER IN 
PERSON")lineitem.l_shipinstruct, lineitem.l_shipmode IN ([Utf8("AIR"), 
Utf8("AIR REG")]) AS lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR 
REG")])Utf8("AIR REG")Utf8("AIR")lineitem.l_shipmode, part.p_size >= Int32(1) 
AS part.p_size >= Int32(1)Int32(1)part.p_size, lineitem.l_quantity, 
lineitem.l_extendedprice, lineitem.l_discount, part.p_brand, part.p_size, 
part.p_container
       Filter: part.p_brand = Utf8("Brand#12") AND part.p_container IN 
([Utf8("SM CASE"), Utf8("SM BOX"), Utf8("SM PACK"), Utf8("SM PKG")]) AND 
lineitem.l_quantity >= Decimal128(Some(100),15,2) AND lineitem.l_quantity <= 
Decimal128(Some(1100),15,2) AND part.p_size <= Int32(5) OR part.p_brand = 
Utf8("Brand#23") AND part.p_container IN ([Utf8("MED BAG"), Utf8("MED BOX"), 
Utf8("MED PKG"), Utf8("MED PACK")]) AND lineitem.l_quantity >= 
Decimal128(Some(1000),15,2) AND lineitem.l_quantity <= 
Decimal128(Some(2000),15,2) AND part.p_size <= Int32(10) OR part.p_brand = 
Utf8("Brand#34") AND part.p_container IN ([Utf8("LG CASE"), Utf8("LG BOX"), 
Utf8("LG PACK"), Utf8("LG PKG")]) AND lineitem.l_quantity >= 
Decimal128(Some(2000),15,2) AND lineitem.l_quantity <= 
Decimal128(Some(3000),15,2) AND part.p_size <= Int32(15)
         Inner Join: lineitem.l_partkey = part.p_partkey
-          Filter: lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")]) AND 
lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON")
+          Filter: lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")]) AND 
lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON") AND lineitem.l_quantity >= 
Decimal128(Some(100),15,2) AND lineitem.l_quantity <= 
Decimal128(Some(1100),15,2) OR lineitem.l_quantity >= 
Decimal128(Some(1000),15,2) AND lineitem.l_quantity <= 
Decimal128(Some(2000),15,2) OR lineitem.l_quantity >= 
Decimal128(Some(2000),15,2) AND lineitem.l_quantity <= 
Decimal128(Some(3000),15,2)

Review Comment:
   These are probably good filters to have added as long as they are selective, 
though they may be expensive to evaluate because they will be evaluated against 
*all* rows in lineitem rather than only those rows which were not filtered out 
by the join with part.
   
   To make this super fast, maybe could use "sideways information passing" to 
push down the `part` filtering into the lineitem scan as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to