avantgardnerio opened a new pull request, #2813:
URL: https://github.com/apache/arrow-datafusion/pull/2813
# Which issue does this PR close?
Closes #160.
# Rationale for this change
In order to evaluate DataFusion as a candidate query engine, users need to
be able to run industry standard benchmarks like TPC-H. Query 4 is a good
initial candidate, because it is being blocked only by a relatively simple
optimization rule to turn `exists` subqueries into `join`s.
This PR includes the minimum necessary changes to get Query 4 passing, but I
believe this is a generalizable approach that will work for the remaining
queries in the TPC-H suite being blocked by subquery-related issues.
I wanted to PR early to start the conversation, but I intend to either
submit subsequent PRs generalizing this approach, or extend this PR until we
have all the TPC-H subquery cases covered.
# What changes are included in this PR?
An optimization rule for decorelating a narrowly defined set of queries.
Those not explicitly covered will remain unaltered.
# Are there any user-facing changes?
Any single-column join `where exists` correlated subquery should now work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]