clintropolis commented on code in PR #17055:
URL: https://github.com/apache/druid/pull/17055#discussion_r1758441313


##########
processing/src/main/java/org/apache/druid/query/filter/Filter.java:
##########
@@ -31,12 +31,27 @@
 import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
 
 import javax.annotation.Nullable;
+import java.util.LinkedHashSet;
 import java.util.Map;
 import java.util.Set;
 
 @SubclassesMustOverrideEqualsAndHashCode
 public interface Filter
 {
+  default String getFilterString()
+  {
+    return toString();
+  }
+
+  /**
+   * Returns a LinkedHashSet of all child filters for this filter with no 
duplicates.
+   * <p>The ordering of child filters is important in some cases, 
e.x.short-curcuiting.</p>
+   */
+  default LinkedHashSet<Filter> getFilters()
+  {
+    return new LinkedHashSet<>();
+  }

Review Comment:
   hmm, in some sense, this doesn't really seem an appropriate place for this 
since not all filters have child filters.. but i see it does make the builder 
stuff a bit easier to work with.
   
   I guess it is somewhat obvious who should implement this method at least for 
the way we are currently using it, but it still feels like the javadoc should 
be updated a bit perhaps to make it clarified. Should they have multiple 
children or does anything that wraps another filter fit this pattern to? Should 
`NotFilter` implement this method since it has a base filter that it is 
inverting? These are the questions I have when I see this method on `Filter` 
instead of `BooleanFilter`. For the record because of the way the bundling 
works for `NotFilter` I don't believe it doesn't really need to implement this.
   
   Also, maybe we should at least make a static empty so we don't need to make 
a bunch of objects that won't be used?



##########
processing/src/main/java/org/apache/druid/query/filter/FilterBundle.java:
##########
@@ -151,6 +151,91 @@ public interface MatcherBundle
     boolean canVectorize();
   }
 
+  /**
+   * Wraps info needed to build a {@link FilterBundle}, and provides an 
estimated compute cost for
+   * {@link BitmapColumnIndex#computeBitmapResult}.
+   */
+  public static class Builder
+  {
+    private final Filter filter;
+    private final ColumnIndexSelector columnIndexSelector;
+    @Nullable
+    private final BitmapColumnIndex bitmapColumnIndex;
+    private final List<FilterBundle.Builder> childBuilders;
+    private final int estimatedComputeCost;
+
+    public Builder(Filter filter, ColumnIndexSelector columnIndexSelector)
+    {
+      this.filter = filter;
+      this.columnIndexSelector = columnIndexSelector;
+      this.bitmapColumnIndex = 
filter.getBitmapColumnIndex(columnIndexSelector);
+      // Construct Builder instances for all child filters recursively.
+      this.childBuilders = new ArrayList<>(filter.getFilters().size());
+      for (Filter childFilter : filter.getFilters()) {
+        childBuilders.add(new FilterBundle.Builder(childFilter, 
columnIndexSelector));
+      }
+      // Sort child builders by cost in ASCENDING order, should be stable by 
default.
+      
this.childBuilders.sort(Comparator.comparingInt(FilterBundle.Builder::getEstimatedComputeCost));
+      this.estimatedComputeCost = calculateEstimatedComputeCost();
+    }
+
+    private int calculateEstimatedComputeCost()
+    {
+      if (this.bitmapColumnIndex == null) {
+        return Integer.MAX_VALUE;
+      }
+      int cost = this.bitmapColumnIndex.estimatedComputeCost();
+      if (cost == Integer.MAX_VALUE) {
+        return Integer.MAX_VALUE;
+      }
+
+      for (FilterBundle.Builder childBuilder : this.childBuilders) {
+        int childCost = childBuilder.getEstimatedComputeCost();
+        if (childCost >= Integer.MAX_VALUE - cost) {
+          return Integer.MAX_VALUE;
+        }
+        cost += childCost;
+      }
+      return cost;
+    }
+
+    public Filter getFilter()
+    {
+      return this.filter;
+    }
+
+    public ColumnIndexSelector getColumnIndexSelector()
+    {
+      return this.columnIndexSelector;
+    }
+
+    @Nullable
+    public BitmapColumnIndex getBitmapColumnIndex()
+    {
+      return this.bitmapColumnIndex;
+    }
+
+    public List<FilterBundle.Builder> getChildBuilders()
+    {
+      return this.childBuilders;
+    }
+
+    public int getEstimatedComputeCost()

Review Comment:
   the name of this method should clarify cost to compute bitmaps, maybe 
`getIndexComputeCost`? Just thinking in case we ever also want to think about 
arranging the order for `ValueMatcher`/`VectorValueMatcher` since different 
implementations have different complexity similar to how we're going to be 
doing with bitmap computation.



##########
processing/src/main/java/org/apache/druid/query/filter/FilterBundle.java:
##########
@@ -151,6 +151,91 @@ public interface MatcherBundle
     boolean canVectorize();
   }
 
+  /**
+   * Wraps info needed to build a {@link FilterBundle}, and provides an 
estimated compute cost for
+   * {@link BitmapColumnIndex#computeBitmapResult}.
+   */
+  public static class Builder
+  {
+    private final Filter filter;
+    private final ColumnIndexSelector columnIndexSelector;
+    @Nullable
+    private final BitmapColumnIndex bitmapColumnIndex;
+    private final List<FilterBundle.Builder> childBuilders;
+    private final int estimatedComputeCost;
+
+    public Builder(Filter filter, ColumnIndexSelector columnIndexSelector)
+    {
+      this.filter = filter;
+      this.columnIndexSelector = columnIndexSelector;
+      this.bitmapColumnIndex = 
filter.getBitmapColumnIndex(columnIndexSelector);
+      // Construct Builder instances for all child filters recursively.
+      this.childBuilders = new ArrayList<>(filter.getFilters().size());
+      for (Filter childFilter : filter.getFilters()) {
+        childBuilders.add(new FilterBundle.Builder(childFilter, 
columnIndexSelector));
+      }
+      // Sort child builders by cost in ASCENDING order, should be stable by 
default.
+      
this.childBuilders.sort(Comparator.comparingInt(FilterBundle.Builder::getEstimatedComputeCost));
+      this.estimatedComputeCost = calculateEstimatedComputeCost();
+    }
+
+    private int calculateEstimatedComputeCost()
+    {
+      if (this.bitmapColumnIndex == null) {
+        return Integer.MAX_VALUE;
+      }
+      int cost = this.bitmapColumnIndex.estimatedComputeCost();
+      if (cost == Integer.MAX_VALUE) {
+        return Integer.MAX_VALUE;
+      }
+
+      for (FilterBundle.Builder childBuilder : this.childBuilders) {
+        int childCost = childBuilder.getEstimatedComputeCost();
+        if (childCost >= Integer.MAX_VALUE - cost) {
+          return Integer.MAX_VALUE;
+        }
+        cost += childCost;
+      }
+      return cost;
+    }
+
+    public Filter getFilter()
+    {
+      return this.filter;

Review Comment:
   total nitpick: we don't typically use `this` qualifier except when required 
(usually just constructors)



##########
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorHolder.java:
##########
@@ -732,8 +731,7 @@ private static FilterBundle makeFilterBundle(
       return null;
     }
     final long bitmapConstructionStartNs = System.nanoTime();
-    final FilterBundle filterBundle = filter.makeFilterBundle(
-        bitmapIndexSelector,
+    final FilterBundle filterBundle = new FilterBundle.Builder(filter, 
bitmapIndexSelector).build(

Review Comment:
   I think just in case we should add a new boolean query context flag, 
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/QueryContexts.java,
 something like `"cursorAutoArrangeFilters"` and pass it into the builder so 
that we can disable the sort per query if we need to for any reason.
   
   You can get access to the `QueryContext` itself from the `CursorBuildSpec` 
passed into the constructor of `QueryableIndexCursorHolder`, pass it into the 
`CursorResources` to feed to this method, and retrieve the value with something 
like `queryContext.getBoolean(QueryContexts.CURSOR_AUTO_ARRANGE_FILTERS)` (or 
whatever you name the static variable).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to