mustafasrepo commented on code in PR #7040:
URL: https://github.com/apache/arrow-datafusion/pull/7040#discussion_r1276223297


##########
datafusion/sql/src/select.rs:
##########
@@ -431,6 +432,37 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         group_by_exprs: Vec<Expr>,
         aggr_exprs: Vec<Expr>,
     ) -> Result<(LogicalPlan, Vec<Expr>, Option<Expr>)> {
+        let schema = input.schema();

Review Comment:
   > Yes I agree that sounds weird. I still don't understand why the functional 
dependency can't be computed from the plan itself 
   
   Currently, if the query is
   ```sql
   SELECT sn, amount
               FROM sales_global
               GROUP BY sn
   ```
   and we know that `sn` is primary key.  We rewrite query above as
   ```sql
   SELECT sn, amount
               FROM sales_global
               GROUP BY sn, amount
   ```
   so that select either contains group by expressions or aggregate 
expressions. This update is done here.
   However, if the query were
   ```sql
   SELECT sn
               FROM sales_global
               GROUP BY sn
   ```
   we wouldn't rewrite query as 
   ```sql
   SELECT sn
               FROM sales_global
               GROUP BY sn, amount
   ```
   hence, amount is not used and. We do not need to update group by here. 
   
   For this reason, we need the information of `select exprs` to decide how to 
extend group by expressions to support a query.
   However, if we were to extend group by expressions with their functional 
dependencies in all cases. We won't need the `select exprs` for this decision. 
Such as, we would treat 
   ```sql
   SELECT sn
               FROM sales_global
               GROUP BY sn
   ```
   as
   ```sql
   SELECT sn
               FROM sales_global
               GROUP BY sn, amount
   ```
   under the hood.
   
   What do you think about extending group by expression in all cases, without 
checking for `select exprs`. I think this is a bit confusing.
   
   In short, functional dependency can be calculated without external 
information. However, query rewrite cannot be done without resorting to the 
information in the `select exprs` in current design.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to