jon-wei commented on a change in pull request #9516: More efficient join filter
rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376245
##########
File path:
processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
##########
@@ -50,73 +51,126 @@
/**
* When there is a filter in a join query, we can sometimes improve
performance by applying parts of the filter
* when we first read from the base table instead of after the join.
- *
- * This class provides a {@link #splitFilter(HashJoinSegmentStorageAdapter,
Set, Filter, boolean, boolean)} method that
- * takes a filter and splits it into a portion that should be applied to the
base table prior to the join, and a
- * portion that should be applied after the join.
- *
+ * <p>
* The first step of the filter splitting is to convert the filter into
* https://en.wikipedia.org/wiki/Conjunctive_normal_form (an AND of ORs). This
allows us to consider each
* OR clause independently as a candidate for filter push down to the base
table.
- *
+ * <p>
* A filter clause can be pushed down if it meets one of the following
conditions:
* - The filter only applies to columns from the base table
* - The filter applies to columns from the join table, and we determine that
the filter can be rewritten
* into a filter on columns from the base table
- *
+ * <p>
* For the second case, where we rewrite filter clauses, the rewritten clause
can be less selective than the original,
* so we preserve the original clause in the post-join filtering phase.
+ * <p>
+ * The starting point for join analysis is the {@link
#computeJoinFilterPreAnalysis} method. This method should be
+ * called before performing any per-segment join query work. This method
converts the query filter into
+ * conjunctive normal form, and splits the CNF clauses into a portion that
only references base table columns and
+ * a portion that references join table columns. For the filter clauses that
apply to join table columns, the
+ * pre-analysis step computes the information necessary for rewriting such
filters into filters on base table columns.
+ * <p>
+ * The result of this pre-analysis method should be passed into the next step
of join filter analysis, described below.
+ * <p>
+ * The {@link #splitFilter(JoinFilterPreAnalysis)} method takes the
pre-analysis result and optionally applies the\
+ * filter rewrite and push down operations on a per-segment level.
*/
public class JoinFilterAnalyzer
{
private static final String PUSH_DOWN_VIRTUAL_COLUMN_NAME_BASE =
"JOIN-FILTER-PUSHDOWN-VIRTUAL-COLUMN-";
private static final ColumnSelectorFactory ALL_NULL_COLUMN_SELECTOR_FACTORY
= new AllNullColumnSelectorFactory();
/**
- * Analyze a filter and return a JoinFilterSplit indicating what parts of
the filter should be applied pre-join
- * and post-join.
+ * Before making per-segment filter splitting decisions, we first do a
pre-analysis step
+ * where we convert the query filter (if any) into conjunctive normal form
and then
+ * determine the structure of RHS filter rewrites (if any), since this
information is shared across all
+ * per-segment operations.
*
- * @param hashJoinSegmentStorageAdapter The storage adapter that is being
queried
- * @param baseColumnNames Set of names of columns that belong
to the base table,
- * including pre-join virtual columns
- * @param originalFilter Original filter from the query
- * @param enableFilterPushDown Whether to enable filter push down
- * @return A JoinFilterSplit indicating what parts of the filter should be
applied pre-join
- * and post-join.
+ * See {@link JoinFilterPreAnalysis} for details on the result of this
pre-analysis step.
Review comment:
Adjusted the alignment
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]