imply-cheddar commented on a change in pull request #12315:
URL: https://github.com/apache/druid/pull/12315#discussion_r822353278



##########
File path: 
processing/src/main/java/org/apache/druid/segment/column/BitmapIndex.java
##########
@@ -23,28 +23,85 @@
 import org.apache.druid.collections.bitmap.ImmutableBitmap;
 
 import javax.annotation.Nullable;
+import java.util.Set;
+import java.util.function.Predicate;
 
 /**
+ * Provides a mechanism to get {@link ImmutableBitmap} for a value, or {@link 
Iterable} of {@link ImmutableBitmap}
+ * for a range or exact set of values, the set bits of which correspond to 
which rows contain the matching values.
+ *
+ * Used to power {@link org.apache.druid.segment.BitmapOffset} and
+ * {@link org.apache.druid.segment.vector.BitmapVectorOffset} which are used 
with column cursors for fast filtering
+ * of indexed values.
+ *
+ * The column must be ingested with a bitmap index for filters to use them and 
to participate in this "pre-filtering"
+ * step when scanning segments
+ *
+ * @see org.apache.druid.segment.QueryableIndexStorageAdapter#analyzeFilter
  */
 public interface BitmapIndex
 {
+  /**
+   * Get the cardinality of the underlying value dictionary
+   */
   int getCardinality();
 
+  /**
+   * Get the value in the underlying value dictionary of the specified 
dictionary id
+   */
   @Nullable
   String getValue(int index);
 
+  /**
+   * Returns true if the underlying value dictionary has nulls
+   */
   boolean hasNulls();
 
   BitmapFactory getBitmapFactory();
 
   /**
-   * Returns the index of "value" in this BitmapIndex, or (-(insertion point) 
- 1) if the value is not
-   * present, in the manner of Arrays.binarySearch.
-   *
-   * @param value value to search for
-   * @return index of value, or negative number equal to (-(insertion point) - 
1).
+   * Returns the index of "value" in this BitmapIndex, or a negative value, if 
not present in the underlying dictionary
    */
   int getIndex(@Nullable String value);
 
+  /**
+   * Get the {@link ImmutableBitmap} for dictionary id of the underlying 
dictionary
+   */
   ImmutableBitmap getBitmap(int idx);
+
+  /**
+   * Get the {@link ImmutableBitmap} corresponding to the supplied value
+   */
+  ImmutableBitmap getBitmapForValue(@Nullable String value);
+
+  /**
+   * Get an {@link Iterable} of {@link ImmutableBitmap} corresponding to the 
values supplied in the specified range
+   */
+  default Iterable<ImmutableBitmap> getBitmapsInRange(

Review comment:
       I believe that these are always Union'd together and I'm not really 
seeing a great reason for why this shouldn't just return a single 
`ImmutableBitmap` that has already been unioned instead of an Iterable.

##########
File path: 
processing/src/main/java/org/apache/druid/segment/column/BitmapIndex.java
##########
@@ -23,28 +23,85 @@
 import org.apache.druid.collections.bitmap.ImmutableBitmap;
 
 import javax.annotation.Nullable;
+import java.util.Set;
+import java.util.function.Predicate;
 
 /**
+ * Provides a mechanism to get {@link ImmutableBitmap} for a value, or {@link 
Iterable} of {@link ImmutableBitmap}
+ * for a range or exact set of values, the set bits of which correspond to 
which rows contain the matching values.
+ *
+ * Used to power {@link org.apache.druid.segment.BitmapOffset} and
+ * {@link org.apache.druid.segment.vector.BitmapVectorOffset} which are used 
with column cursors for fast filtering
+ * of indexed values.
+ *
+ * The column must be ingested with a bitmap index for filters to use them and 
to participate in this "pre-filtering"
+ * step when scanning segments
+ *
+ * @see org.apache.druid.segment.QueryableIndexStorageAdapter#analyzeFilter
  */
 public interface BitmapIndex
 {
+  /**
+   * Get the cardinality of the underlying value dictionary
+   */
   int getCardinality();
 
+  /**
+   * Get the value in the underlying value dictionary of the specified 
dictionary id
+   */
   @Nullable
   String getValue(int index);
 
+  /**
+   * Returns true if the underlying value dictionary has nulls
+   */
   boolean hasNulls();
 
   BitmapFactory getBitmapFactory();
 
   /**
-   * Returns the index of "value" in this BitmapIndex, or (-(insertion point) 
- 1) if the value is not
-   * present, in the manner of Arrays.binarySearch.
-   *
-   * @param value value to search for
-   * @return index of value, or negative number equal to (-(insertion point) - 
1).
+   * Returns the index of "value" in this BitmapIndex, or a negative value, if 
not present in the underlying dictionary
    */
   int getIndex(@Nullable String value);
 
+  /**
+   * Get the {@link ImmutableBitmap} for dictionary id of the underlying 
dictionary
+   */
   ImmutableBitmap getBitmap(int idx);
+
+  /**
+   * Get the {@link ImmutableBitmap} corresponding to the supplied value
+   */
+  ImmutableBitmap getBitmapForValue(@Nullable String value);
+
+  /**
+   * Get an {@link Iterable} of {@link ImmutableBitmap} corresponding to the 
values supplied in the specified range
+   */
+  default Iterable<ImmutableBitmap> getBitmapsInRange(
+      @Nullable String startValue,
+      boolean startStrict,
+      @Nullable String endValue,
+      boolean endStrict

Review comment:
       It might be nice to package these things up into a holder Object of some 
sort so that it's a bit easier to adjust parameters if/when we need to.  That 
could probably be done in a subsequent change though.

##########
File path: 
processing/src/main/java/org/apache/druid/segment/column/BitmapIndex.java
##########
@@ -23,28 +23,85 @@
 import org.apache.druid.collections.bitmap.ImmutableBitmap;
 
 import javax.annotation.Nullable;
+import java.util.Set;
+import java.util.function.Predicate;
 
 /**
+ * Provides a mechanism to get {@link ImmutableBitmap} for a value, or {@link 
Iterable} of {@link ImmutableBitmap}
+ * for a range or exact set of values, the set bits of which correspond to 
which rows contain the matching values.
+ *
+ * Used to power {@link org.apache.druid.segment.BitmapOffset} and
+ * {@link org.apache.druid.segment.vector.BitmapVectorOffset} which are used 
with column cursors for fast filtering
+ * of indexed values.
+ *
+ * The column must be ingested with a bitmap index for filters to use them and 
to participate in this "pre-filtering"
+ * step when scanning segments
+ *
+ * @see org.apache.druid.segment.QueryableIndexStorageAdapter#analyzeFilter
  */
 public interface BitmapIndex
 {
+  /**
+   * Get the cardinality of the underlying value dictionary
+   */
   int getCardinality();
 
+  /**
+   * Get the value in the underlying value dictionary of the specified 
dictionary id
+   */
   @Nullable
   String getValue(int index);
 
+  /**
+   * Returns true if the underlying value dictionary has nulls
+   */
   boolean hasNulls();
 
   BitmapFactory getBitmapFactory();
 
   /**
-   * Returns the index of "value" in this BitmapIndex, or (-(insertion point) 
- 1) if the value is not
-   * present, in the manner of Arrays.binarySearch.
-   *
-   * @param value value to search for
-   * @return index of value, or negative number equal to (-(insertion point) - 
1).
+   * Returns the index of "value" in this BitmapIndex, or a negative value, if 
not present in the underlying dictionary
    */
   int getIndex(@Nullable String value);
 
+  /**
+   * Get the {@link ImmutableBitmap} for dictionary id of the underlying 
dictionary
+   */
   ImmutableBitmap getBitmap(int idx);
+
+  /**
+   * Get the {@link ImmutableBitmap} corresponding to the supplied value
+   */
+  ImmutableBitmap getBitmapForValue(@Nullable String value);
+
+  /**
+   * Get an {@link Iterable} of {@link ImmutableBitmap} corresponding to the 
values supplied in the specified range
+   */
+  default Iterable<ImmutableBitmap> getBitmapsInRange(
+      @Nullable String startValue,
+      boolean startStrict,
+      @Nullable String endValue,
+      boolean endStrict
+  )
+  {
+    return getBitmapsInRange(startValue, startStrict, endValue, endStrict, 
(index) -> true);
+  }
+
+  /**
+   * Get an {@link Iterable} of {@link ImmutableBitmap} corresponding to the 
values supplied in the specified range
+   * whose dictionary ids also match some predicate
+   */
+  Iterable<ImmutableBitmap> getBitmapsInRange(
+      @Nullable String startValue,
+      boolean startStrict,
+      @Nullable String endValue,
+      boolean endStrict,
+      Predicate<String> matcher
+  );
+
+  /**
+   * Get an {@link Iterable} of {@link ImmutableBitmap} corresponding to the 
specified set of values (if they are
+   * contained in the underlying column)
+   */
+  Iterable<ImmutableBitmap> getBitmapsForValues(Set<String> values);

Review comment:
       And here on Iterable too

##########
File path: 
processing/src/main/java/org/apache/druid/segment/column/BitmapIndex.java
##########
@@ -23,28 +23,85 @@
 import org.apache.druid.collections.bitmap.ImmutableBitmap;
 
 import javax.annotation.Nullable;
+import java.util.Set;
+import java.util.function.Predicate;
 
 /**
+ * Provides a mechanism to get {@link ImmutableBitmap} for a value, or {@link 
Iterable} of {@link ImmutableBitmap}
+ * for a range or exact set of values, the set bits of which correspond to 
which rows contain the matching values.
+ *
+ * Used to power {@link org.apache.druid.segment.BitmapOffset} and
+ * {@link org.apache.druid.segment.vector.BitmapVectorOffset} which are used 
with column cursors for fast filtering
+ * of indexed values.
+ *
+ * The column must be ingested with a bitmap index for filters to use them and 
to participate in this "pre-filtering"
+ * step when scanning segments
+ *
+ * @see org.apache.druid.segment.QueryableIndexStorageAdapter#analyzeFilter
  */
 public interface BitmapIndex
 {
+  /**
+   * Get the cardinality of the underlying value dictionary
+   */
   int getCardinality();
 
+  /**
+   * Get the value in the underlying value dictionary of the specified 
dictionary id
+   */
   @Nullable
   String getValue(int index);
 
+  /**
+   * Returns true if the underlying value dictionary has nulls
+   */
   boolean hasNulls();
 
   BitmapFactory getBitmapFactory();
 
   /**
-   * Returns the index of "value" in this BitmapIndex, or (-(insertion point) 
- 1) if the value is not
-   * present, in the manner of Arrays.binarySearch.
-   *
-   * @param value value to search for
-   * @return index of value, or negative number equal to (-(insertion point) - 
1).
+   * Returns the index of "value" in this BitmapIndex, or a negative value, if 
not present in the underlying dictionary
    */
   int getIndex(@Nullable String value);
 
+  /**
+   * Get the {@link ImmutableBitmap} for dictionary id of the underlying 
dictionary
+   */
   ImmutableBitmap getBitmap(int idx);
+
+  /**
+   * Get the {@link ImmutableBitmap} corresponding to the supplied value
+   */
+  ImmutableBitmap getBitmapForValue(@Nullable String value);
+
+  /**
+   * Get an {@link Iterable} of {@link ImmutableBitmap} corresponding to the 
values supplied in the specified range
+   */
+  default Iterable<ImmutableBitmap> getBitmapsInRange(
+      @Nullable String startValue,
+      boolean startStrict,
+      @Nullable String endValue,
+      boolean endStrict
+  )
+  {
+    return getBitmapsInRange(startValue, startStrict, endValue, endStrict, 
(index) -> true);
+  }
+
+  /**
+   * Get an {@link Iterable} of {@link ImmutableBitmap} corresponding to the 
values supplied in the specified range
+   * whose dictionary ids also match some predicate
+   */
+  Iterable<ImmutableBitmap> getBitmapsInRange(

Review comment:
       Same comment here on Iterable

##########
File path: 
processing/src/main/java/org/apache/druid/segment/virtual/ListFilteredVirtualColumn.java
##########
@@ -252,22 +256,172 @@ public BitmapFactory getBitmapFactory()
       return delegate.getBitmapFactory();
     }
 
+    @Override
+    public int getCardinality()
+    {
+      return idMapping.getValueCardinality();
+    }
+
+    @Override
+    public int getIndex(@Nullable String value)
+    {
+      return getReverseIndex(value);
+    }
+
     @Override
     public ImmutableBitmap getBitmap(int idx)
     {
       return delegate.getBitmap(idMapping.getReverseId(idx));
     }
 
     @Override
-    public int getCardinality()
+    public ImmutableBitmap getBitmapForValue(@Nullable String value)
     {
-      return idMapping.getValueCardinality();
+      if (getReverseIndex(value) < 0) {
+        return delegate.getBitmap(-1);
+      }
+      return delegate.getBitmapForValue(value);
     }
 
     @Override
-    public int getIndex(@Nullable String value)
+    public Iterable<ImmutableBitmap> getBitmapsInRange(

Review comment:
       For the "include" case, the implementations of these methods seem like 
they should be able to be a lot simpler.  The normal use case provides a much 
smaller list of things to show, it seems like we could quite rapidly filter 
through that list and then delegate to something that uses that list.
   
   The deny-list case requires different logic.  But, then I wonder how 
important this actually is, or is the difference between the implementation 
that you have and this more direct implementation just a few milliseconds such 
that it doesn't actually matter...

##########
File path: 
processing/src/main/java/org/apache/druid/segment/serde/StringBitmapIndexColumnPartSupplier.java
##########
@@ -94,6 +101,179 @@ public ImmutableBitmap getBitmap(int idx)
         final ImmutableBitmap bitmap = bitmaps.get(idx);
         return bitmap == null ? bitmapFactory.makeEmptyImmutableBitmap() : 
bitmap;
       }
+
+      @Override
+      public ImmutableBitmap getBitmapForValue(@Nullable String value)
+      {
+        final int idx = dictionary.indexOf(value);
+        return getBitmap(idx);
+      }
+
+      @Override
+      public Iterable<ImmutableBitmap> getBitmapsInRange(
+          @Nullable String startValue,
+          boolean startStrict,
+          @Nullable String endValue,
+          boolean endStrict
+      )
+      {
+        int startIndex, endIndex;
+        if (startValue == null) {
+          startIndex = 0;
+        } else {
+          final int found = 
dictionary.indexOf(NullHandling.emptyToNullIfNeeded(startValue));
+          if (found >= 0) {
+            startIndex = startStrict ? found + 1 : found;
+          } else {
+            startIndex = -(found + 1);
+          }
+        }
+
+        if (endValue == null) {
+          endIndex = dictionary.size();
+        } else {
+          final int found = 
dictionary.indexOf(NullHandling.emptyToNullIfNeeded(endValue));
+          if (found >= 0) {
+            endIndex = endStrict ? found : found + 1;
+          } else {
+            endIndex = -(found + 1);
+          }
+        }
+
+        endIndex = startIndex > endIndex ? startIndex : endIndex;
+        final IntIterator rangeIterator = IntListUtils.fromTo(startIndex, 
endIndex).iterator();
+        return () -> new Iterator<ImmutableBitmap>()

Review comment:
       If you change this to only return a single Bitmap and union them here, 
then you don't need to worry about it ;)

##########
File path: processing/src/main/java/org/apache/druid/segment/data/Indexed.java
##########
@@ -55,8 +55,8 @@
   int indexOf(@Nullable T value);
 
   /**
-   * Returns the index of "value" in some object whose values are accessible 
by index some {@link IndexedGetter}, or
-   * (-(insertion point) - 1) if the value is not present, in the manner of 
Arrays.binarySearch.
+   * Returns the index of "value" in some object in the {@link Indexed}, or 
(-(insertion point) - 1) if the value is
+   * not present, in the manner of Arrays.binarySearch.

Review comment:
       I'd personally prefer that we remove this bit of documentation.  It's an 
implementation detail that I don't think we should try to enshrine in the docs 
anymore (especially after this PR)

##########
File path: processing/src/main/java/org/apache/druid/segment/data/Indexed.java
##########
@@ -65,7 +65,7 @@
    *
    * @return index of value, or negative number equal to (-(insertion point) - 
1).

Review comment:
       It's here too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to