gianm opened a new pull request #11068:
URL: https://github.com/apache/druid/pull/11068
The main logic for doing the rewrite is in **JoinableFactoryWrapper**'s
segmentMapFn method. The requirements are:
- It must be an inner equi-join.
- The right-hand columns referenced by the condition must not contain any
duplicate values. (If they did, the inner join would not be guaranteed
to return at most one row for each left-hand-side row.)
- No columns from the right-hand side can be used by anything other than
the join condition itself.
**HashJoinSegmentStorageAdapter** is also modified to pass through to
the base adapter (even allowing vectorization!) in the case where 100%
of join clauses could be rewritten as filters.
In support of this goal:
- Add **Query getRequiredColumns()** method to help us figure out whether
the right-hand side of a join datasource is being used or not.
- Add **JoinConditionAnalysis getRequiredColumns()** method to help us
figure out if the right-hand side of a join is being used by later
join clauses acting on the same base.
- Add **Joinable getNonNullColumnValuesIfAllUnique** method to enable
retrieving the set of values that will form the "in" filter.
- Add **LookupExtractor canGetKeySet()** and **keySet()** methods to support
LookupJoinable in its efforts to implement the new Joinable method.
- Add **enableRewriteJoinToFilter** feature flag to
JoinFilterRewriteConfig. The default is disabled.
Testing strategy:
- Add join-to-filter conversion tests to JoinableFactorWrapperTests.
- Add getRequiredColumns tests to individual query engines.
- Add getNonNullColumnValuesIfAllUnique tests to LookupJoinable and
IndexedTableJoinable.
- Extend BaseCalciteQueryTest's QueryContextForJoinProvider to also
provide query contexts that enable this rewrite.
- Add some new tests to CalciteQueryTest that are designed to exercise
this rewrite. (And some existing tests did, too.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]