zyuiop opened a new pull request, #23047:
URL: https://github.com/apache/datafusion/pull/23047

   This enables aggregate queries that SELECT over two tables, only one of 
which is in the WHERE, but for which the join condition guarantees a functional 
dependency.
   
   ## Rationale for this change
   
   Currently, queries such as 
   
   ```sql
   
   SELECT 
       Paper.paperId, PaperConflict.conflictType, 
group_concat(PaperReview.rflags, ' ', PaperReview.reviewNeedsSubmit, ' ', 
PaperReview.reviewRound)
   FROM Paper
       JOIN PaperReview ON (PaperReview.paperId=Paper.paperId
           and ((PaperReview.contactId=1 and (PaperReview.rflags&7936)!=0)))
       LEFT JOIN PaperConflict ON (PaperConflict.paperId=Paper.paperId
           and (PaperConflict.contactId=1))
       GROUP BY Paper.paperId
   ```
   
   don't work, because datafusion does not detect that 
`PaperConflict.conflictType` uniquely depends on `Paper.paperId`, and therefore 
can be included in the SELECT clause.
   
   This PR attempts to address this problem, although the solution is a bit 
clunky and I am not certain it works in all cases.
   
   ## What changes are included in this PR?
   
   Extend `FunctionalDependencies::join` so that it takes as parameter a list 
of columns which appear in equality comparisons in the ON clause of the join.
   This makes it possible to extend functional dependencies across the JOIN if 
the source columns of a dependency appear in the list.
   
   **Alternative approach**: this feature may alternatively be realized in an 
analyser step, but the problem is that the aggregate/project validity check is 
done in SqlToRel, that is _before_ the analyser runs.
   
   ## Are these changes tested?
   
   Yes
   
   
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api 
change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to