clintropolis commented on a change in pull request #9516: More efficient join 
filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393264758
 
 

 ##########
 File path: 
processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -50,73 +51,126 @@
 /**
  * When there is a filter in a join query, we can sometimes improve 
performance by applying parts of the filter
  * when we first read from the base table instead of after the join.
- *
- * This class provides a {@link #splitFilter(HashJoinSegmentStorageAdapter, 
Set, Filter, boolean, boolean)} method that
- * takes a filter and splits it into a portion that should be applied to the 
base table prior to the join, and a
- * portion that should be applied after the join.
- *
+ * <p>
  * The first step of the filter splitting is to convert the filter into
  * https://en.wikipedia.org/wiki/Conjunctive_normal_form (an AND of ORs). This 
allows us to consider each
  * OR clause independently as a candidate for filter push down to the base 
table.
- *
+ * <p>
  * A filter clause can be pushed down if it meets one of the following 
conditions:
  * - The filter only applies to columns from the base table
  * - The filter applies to columns from the join table, and we determine that 
the filter can be rewritten
  *   into a filter on columns from the base table
- *
+ * <p>
  * For the second case, where we rewrite filter clauses, the rewritten clause 
can be less selective than the original,
  * so we preserve the original clause in the post-join filtering phase.
+ * <p>
+ * The starting point for join analysis is the {@link 
#computeJoinFilterPreAnalysis} method. This method should be
+ * called before performing any per-segment join query work. This method 
converts the query filter into
+ * conjunctive normal form, and splits the CNF clauses into a portion that 
only references base table columns and
+ * a portion that references join table columns. For the filter clauses that 
apply to join table columns, the
+ * pre-analysis step computes the information necessary for rewriting such 
filters into filters on base table columns.
+ * <p>
+ * The result of this pre-analysis method should be passed into the next step 
of join filter analysis, described below.
+ * <p>
+ * The {@link #splitFilter(JoinFilterPreAnalysis)} method takes the 
pre-analysis result and optionally applies the\
+ * filter rewrite and push down operations on a per-segment level.
  */
 public class JoinFilterAnalyzer
 {
   private static final String PUSH_DOWN_VIRTUAL_COLUMN_NAME_BASE = 
"JOIN-FILTER-PUSHDOWN-VIRTUAL-COLUMN-";
   private static final ColumnSelectorFactory ALL_NULL_COLUMN_SELECTOR_FACTORY 
= new AllNullColumnSelectorFactory();
 
   /**
-   * Analyze a filter and return a JoinFilterSplit indicating what parts of 
the filter should be applied pre-join
-   * and post-join.
+   * Before making per-segment filter splitting decisions, we first do a 
pre-analysis step
+   * where we convert the query filter (if any) into conjunctive normal form 
and then
+   * determine the structure of RHS filter rewrites (if any), since this 
information is shared across all
+   * per-segment operations.
    *
-   * @param hashJoinSegmentStorageAdapter The storage adapter that is being 
queried
-   * @param baseColumnNames               Set of names of columns that belong 
to the base table,
-   *                                      including pre-join virtual columns
-   * @param originalFilter                Original filter from the query
-   * @param enableFilterPushDown          Whether to enable filter push down
-   * @return A JoinFilterSplit indicating what parts of the filter should be 
applied pre-join
-   *         and post-join.
+   * See {@link JoinFilterPreAnalysis} for details on the result of this 
pre-analysis step.
 
 Review comment:
   super nit: could you retain the old formatting where the descriptions are 
offset to the right and aligned? I find it a bit easier to read

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to