clintropolis commented on code in PR #12517:
URL: https://github.com/apache/druid/pull/12517#discussion_r877530125


##########
processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedStringIndexSupplier.java:
##########
@@ -61,29 +67,38 @@ public class DictionaryEncodedStringIndexSupplier 
implements ColumnIndexSupplier
   public DictionaryEncodedStringIndexSupplier(
       BitmapFactory bitmapFactory,
       GenericIndexed<String> dictionary,
-      GenericIndexed<ImmutableBitmap> bitmaps,
-      ImmutableRTree indexedTree
+      GenericIndexed<ByteBuffer> dictionaryUtf8,
+      @Nullable GenericIndexed<ImmutableBitmap> bitmaps,
+      @Nullable ImmutableRTree indexedTree
   )
   {
     this.bitmapFactory = bitmapFactory;
-    this.bitmaps = bitmaps;
     this.dictionary = dictionary;
+    this.dictionaryUtf8 = dictionaryUtf8;
+    this.bitmaps = bitmaps;
     this.indexedTree = indexedTree;
   }
 
   @Nullable
   @Override
+  @SuppressWarnings("unchecked")
   public <T> T as(Class<T> clazz)
   {
-    if (clazz.equals(StringValueSetIndex.class)) {
-      return (T) new 
GenericIndexedDictionaryEncodedStringValueSetIndex(bitmapFactory, dictionary, 
bitmaps);
-    } else if (clazz.equals(DruidPredicateIndex.class)) {
+    if (bitmaps != null && clazz.equals(StringValueSetIndex.class)) {

Review Comment:
   good catch :+1:



##########
processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedStringIndexSupplier.java:
##########
@@ -98,21 +113,21 @@ public ColumnIndexCapabilities getIndexCapabilities()
     }
   }
 
-  private abstract static class BaseGenericIndexedDictionaryEncodedStringIndex
+  private abstract static class BaseGenericIndexedDictionaryEncodedIndex<T>
   {
     protected final BitmapFactory bitmapFactory;
-    protected final GenericIndexed<String> dictionary;
-    protected final GenericIndexed<ImmutableBitmap> bitmaps;
+    protected final Indexed<T> dictionary;
+    protected final Indexed<ImmutableBitmap> bitmaps;
 
-    protected BaseGenericIndexedDictionaryEncodedStringIndex(
+    protected BaseGenericIndexedDictionaryEncodedIndex(
         BitmapFactory bitmapFactory,
-        GenericIndexed<String> dictionary,
+        GenericIndexed<T> dictionary,
         GenericIndexed<ImmutableBitmap> bitmaps
     )
     {
       this.bitmapFactory = bitmapFactory;
-      this.dictionary = dictionary;
-      this.bitmaps = bitmaps;
+      this.dictionary = dictionary.singleThreaded();

Review Comment:
   is a shame that this method loses the more specific type, but is still 
probably better



##########
processing/src/main/java/org/apache/druid/segment/column/LexicographicalRangeIndex.java:
##########
@@ -39,18 +39,18 @@ default BitmapColumnIndex forRange(
       boolean endStrict
   )
   {
-    return forRange(startValue, startStrict, endValue, endStrict, (index) -> 
true);
+    return forRange(startValue, startStrict, endValue, endStrict, null);
   }
 
   /**
    * Get an {@link BitmapColumnIndex} corresponding to the values supplied in 
the specified range whose dictionary ids
-   * also match some predicate, such as to match a prefix
+   * also match some predicate, such as to match a prefix. Null matchers are 
equivalent to always-true.
    */
   BitmapColumnIndex forRange(
       @Nullable String startValue,
       boolean startStrict,
       @Nullable String endValue,
       boolean endStrict,
-      Predicate<String> matcher
+      @Nullable Predicate<String> matcher

Review Comment:
   What do you think about instead of making this `@Nullable` to remove the 
`default` implementation of the version of `forRange` which does not take a 
predicate matcher? I think that is actually the code path your benchmarks were 
measuring due to how you changed `LikeFilter`, since 
`DictionaryEncodedStringIndexSupplier` actually does implement that other 
version.
   
   The reason it had a default implementation is a left-over artifact of #12315 
where all of the methods had to be implemented on `BitmapIndex`, meaning a lot 
of silly implementations were required for things that didn't support those 
kinds of indexes, so the `default` implementation was a laziness on my part.
   
   I think the implementation of the version that doesn't have the matcher 
should be slightly better than the `@Nullable` matcher since it can avoid the 
checks. Though its probably small fries when computing indexes, everything 
counts.
   
   I think for `ListFilteredVirtualColumn` it would be ok to just use the old 
'default' implementation for its version, since its the only other thing that 
implements this index interface.
   
   Alternatively, we could remove the version that doesn't take a matcher 
argument and make this version with `@Nullable` the only method, though I think 
that's slightly less cool because having it split (and not having a default) 
sort of pushes implementors towards providing the more optimal predicate-less 
implementation. If we do go this route, we should change 
`DictionaryEncodedStringIndexSupplier` to make the version without ranges 
private and just use that when the matcher is null.
   
   This confusion was my fault I think, this PR made it more obvious that the 
current state isn't the most intuitive on what to call or implement.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to