zyuiop opened a new pull request, #23047:
URL: https://github.com/apache/datafusion/pull/23047
This enables aggregate queries that SELECT over two tables, only one of
which is in the WHERE, but for which the join condition guarantees a functional
dependency.
## Rationale for this change
Currently, queries such as
```sql
SELECT
Paper.paperId, PaperConflict.conflictType,
group_concat(PaperReview.rflags, ' ', PaperReview.reviewNeedsSubmit, ' ',
PaperReview.reviewRound)
FROM Paper
JOIN PaperReview ON (PaperReview.paperId=Paper.paperId
and ((PaperReview.contactId=1 and (PaperReview.rflags&7936)!=0)))
LEFT JOIN PaperConflict ON (PaperConflict.paperId=Paper.paperId
and (PaperConflict.contactId=1))
GROUP BY Paper.paperId
```
don't work, because datafusion does not detect that
`PaperConflict.conflictType` uniquely depends on `Paper.paperId`, and therefore
can be included in the SELECT clause.
This PR attempts to address this problem, although the solution is a bit
clunky and I am not certain it works in all cases.
## What changes are included in this PR?
Extend `FunctionalDependencies::join` so that it takes as parameter a list
of columns which appear in equality comparisons in the ON clause of the join.
This makes it possible to extend functional dependencies across the JOIN if
the source columns of a dependency appear in the list.
**Alternative approach**: this feature may alternatively be realized in an
analyser step, but the problem is that the aggregate/project validity check is
done in SqlToRel, that is _before_ the analyser runs.
## Are these changes tested?
Yes
## Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]