ygf11 commented on code in PR #4465:
URL: https://github.com/apache/arrow-datafusion/pull/4465#discussion_r1111194134


##########
benchmarks/expected-plans/q10.txt:
##########
@@ -1,12 +1,17 @@
 Sort: revenue DESC NULLS FIRST
   Projection: customer.c_custkey, customer.c_name, 
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount) AS revenue, 
customer.c_acctbal, nation.n_name, customer.c_address, customer.c_phone, 
customer.c_comment
     Aggregate: groupBy=[[customer.c_custkey, customer.c_name, 
customer.c_acctbal, customer.c_phone, nation.n_name, customer.c_address, 
customer.c_comment]], aggr=[[SUM(CAST(lineitem.l_extendedprice AS 
Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(lineitem.l_discount 
AS Decimal128(23, 2)) AS Decimal128(38, 4))) AS SUM(lineitem.l_extendedprice * 
Int64(1) - lineitem.l_discount)]]
-      Inner Join: customer.c_nationkey = nation.n_nationkey
-        Inner Join: orders.o_orderkey = lineitem.l_orderkey
-          Inner Join: customer.c_custkey = orders.o_custkey
-            TableScan: customer projection=[c_custkey, c_name, c_address, 
c_nationkey, c_phone, c_acctbal, c_comment]
-            Filter: orders.o_orderdate >= Date32("8674") AND 
orders.o_orderdate < Date32("8766")
-              TableScan: orders projection=[o_orderkey, o_custkey, o_orderdate]
-          Filter: lineitem.l_returnflag = Utf8("R")
-            TableScan: lineitem projection=[l_orderkey, l_extendedprice, 
l_discount, l_returnflag]
-        TableScan: nation projection=[n_nationkey, n_name]
\ No newline at end of file
+      Projection: customer.c_custkey, customer.c_name, customer.c_address, 
customer.c_phone, customer.c_acctbal, customer.c_comment, 
lineitem.l_extendedprice, lineitem.l_discount, nation.n_name

Review Comment:
   Maybe it is relative to 
[`build_join_schema`](https://github.com/apache/arrow-datafusion/blob/main/datafusion/expr/src/logical_plan/builder.rs#L935).
 Some optimizer rules call this function.
   I think it is ok for calling it before pushdown projection, but I guess it 
is not correct after push down projection.
     
   For the query:
   ```sql
   select a.id from a join b on a.id = b.id
   ``` 
   we call it after pushdown projection:
   1. `schema(a)`: a.id
   2. `schema(b)`: b.id
   
   `build_join_schema` will merge left and right, the result is `a.id` + 
`b.id`, but the expected result should be only `a.id`.
   
   Maybe we can fix it first, and then we will not need the projection any more.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to