clintropolis commented on code in PR #12517:
URL: https://github.com/apache/druid/pull/12517#discussion_r877530125
##########
processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedStringIndexSupplier.java:
##########
@@ -61,29 +67,38 @@ public class DictionaryEncodedStringIndexSupplier
implements ColumnIndexSupplier
public DictionaryEncodedStringIndexSupplier(
BitmapFactory bitmapFactory,
GenericIndexed<String> dictionary,
- GenericIndexed<ImmutableBitmap> bitmaps,
- ImmutableRTree indexedTree
+ GenericIndexed<ByteBuffer> dictionaryUtf8,
+ @Nullable GenericIndexed<ImmutableBitmap> bitmaps,
+ @Nullable ImmutableRTree indexedTree
)
{
this.bitmapFactory = bitmapFactory;
- this.bitmaps = bitmaps;
this.dictionary = dictionary;
+ this.dictionaryUtf8 = dictionaryUtf8;
+ this.bitmaps = bitmaps;
this.indexedTree = indexedTree;
}
@Nullable
@Override
+ @SuppressWarnings("unchecked")
public <T> T as(Class<T> clazz)
{
- if (clazz.equals(StringValueSetIndex.class)) {
- return (T) new
GenericIndexedDictionaryEncodedStringValueSetIndex(bitmapFactory, dictionary,
bitmaps);
- } else if (clazz.equals(DruidPredicateIndex.class)) {
+ if (bitmaps != null && clazz.equals(StringValueSetIndex.class)) {
Review Comment:
good catch :+1:
##########
processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedStringIndexSupplier.java:
##########
@@ -98,21 +113,21 @@ public ColumnIndexCapabilities getIndexCapabilities()
}
}
- private abstract static class BaseGenericIndexedDictionaryEncodedStringIndex
+ private abstract static class BaseGenericIndexedDictionaryEncodedIndex<T>
{
protected final BitmapFactory bitmapFactory;
- protected final GenericIndexed<String> dictionary;
- protected final GenericIndexed<ImmutableBitmap> bitmaps;
+ protected final Indexed<T> dictionary;
+ protected final Indexed<ImmutableBitmap> bitmaps;
- protected BaseGenericIndexedDictionaryEncodedStringIndex(
+ protected BaseGenericIndexedDictionaryEncodedIndex(
BitmapFactory bitmapFactory,
- GenericIndexed<String> dictionary,
+ GenericIndexed<T> dictionary,
GenericIndexed<ImmutableBitmap> bitmaps
)
{
this.bitmapFactory = bitmapFactory;
- this.dictionary = dictionary;
- this.bitmaps = bitmaps;
+ this.dictionary = dictionary.singleThreaded();
Review Comment:
is a shame that this method loses the more specific type, but is still
probably better
##########
processing/src/main/java/org/apache/druid/segment/column/LexicographicalRangeIndex.java:
##########
@@ -39,18 +39,18 @@ default BitmapColumnIndex forRange(
boolean endStrict
)
{
- return forRange(startValue, startStrict, endValue, endStrict, (index) ->
true);
+ return forRange(startValue, startStrict, endValue, endStrict, null);
}
/**
* Get an {@link BitmapColumnIndex} corresponding to the values supplied in
the specified range whose dictionary ids
- * also match some predicate, such as to match a prefix
+ * also match some predicate, such as to match a prefix. Null matchers are
equivalent to always-true.
*/
BitmapColumnIndex forRange(
@Nullable String startValue,
boolean startStrict,
@Nullable String endValue,
boolean endStrict,
- Predicate<String> matcher
+ @Nullable Predicate<String> matcher
Review Comment:
What do you think about instead of making this `@Nullable` to remove the
`default` implementation of the version of `forRange` which does not take a
predicate matcher? I think that is actually the code path your benchmarks were
measuring due to how you changed `LikeFilter`, since
`DictionaryEncodedStringIndexSupplier` actually does implement that other
version.
The reason it had a default implementation is a left-over artifact of #12315
where all of the methods had to be implemented on `BitmapIndex`, meaning a lot
of silly implementations were required for things that didn't support those
kinds of indexes, so the `default` implementation was a laziness on my part.
I think the implementation of the version that doesn't have the matcher
should be slightly better than the `@Nullable` matcher since it can avoid the
checks. Though its probably small fries when computing indexes, everything
counts.
I think for `ListFilteredVirtualColumn` it would be ok to just use the old
'default' implementation for its version, since its the only other thing that
implements this index interface.
Alternatively, we could remove the version that doesn't take a matcher
argument and make this version with `@Nullable` the only method, though I think
that's slightly less cool because having it split (and not having a default)
sort of pushes implementors towards providing the more optimal predicate-less
implementation. If we do go this route, we should change
`DictionaryEncodedStringIndexSupplier` to make the version without ranges
private and just use that when the matcher is null.
This confusion was my fault I think, this PR made it more obvious that the
current state isn't the most intuitive on what to call or implement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]