imply-cheddar commented on code in PR #12388:
URL: https://github.com/apache/druid/pull/12388#discussion_r866431713
##########
processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java:
##########
@@ -193,30 +194,36 @@ private ColumnAnalysis analyzeNumericColumn(
private ColumnAnalysis analyzeStringColumn(
final ColumnCapabilities capabilities,
- final ColumnHolder columnHolder
+ final ColumnHolder columnHolder,
+ final String columnName
)
{
Comparable min = null;
Comparable max = null;
long size = 0;
final int cardinality;
if (capabilities.hasBitmapIndexes()) {
- final BitmapIndex bitmapIndex = columnHolder.getBitmapIndex();
- cardinality = bitmapIndex.getCardinality();
-
- if (analyzingSize()) {
- for (int i = 0; i < cardinality; ++i) {
- String value = bitmapIndex.getValue(i);
- if (value != null) {
- size += StringUtils.estimatedBinaryLengthAsUTF8(value) *
- ((long) bitmapIndex.getBitmapForValue(value).size());
+ ColumnIndexSupplier indexSupplier = columnHolder.getIndexSupplier();
+ if (indexSupplier != null) {
+ final DictionaryEncodedStringValueIndex valueIndex =
indexSupplier.getIndex(
+ DictionaryEncodedStringValueIndex.class
+ );
+ Preconditions.checkNotNull(valueIndex, "Column [%s] does not have a
value index", columnName);
Review Comment:
This should be equivalent to `capabilities.hasBitmapIndexes()`. How about
we remove that check above and replace it with the 2-phased checks for the
`ColumnIndexSupplier` and then that it can return a
`DictionaryEncodedStringValueIndex`?
##########
processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java:
##########
@@ -193,30 +194,36 @@ private ColumnAnalysis analyzeNumericColumn(
private ColumnAnalysis analyzeStringColumn(
final ColumnCapabilities capabilities,
- final ColumnHolder columnHolder
+ final ColumnHolder columnHolder,
+ final String columnName
)
{
Comparable min = null;
Comparable max = null;
long size = 0;
final int cardinality;
if (capabilities.hasBitmapIndexes()) {
- final BitmapIndex bitmapIndex = columnHolder.getBitmapIndex();
- cardinality = bitmapIndex.getCardinality();
-
- if (analyzingSize()) {
- for (int i = 0; i < cardinality; ++i) {
- String value = bitmapIndex.getValue(i);
- if (value != null) {
- size += StringUtils.estimatedBinaryLengthAsUTF8(value) *
- ((long) bitmapIndex.getBitmapForValue(value).size());
+ ColumnIndexSupplier indexSupplier = columnHolder.getIndexSupplier();
+ if (indexSupplier != null) {
Review Comment:
nit: this is a `!=` on an if with an else, invert them to remove the
negation please.
##########
processing/src/main/java/org/apache/druid/segment/ColumnSelectorColumnIndexSelector.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import org.apache.druid.collections.bitmap.BitmapFactory;
+import org.apache.druid.query.filter.ColumnIndexSelector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.ColumnIndexSupplier;
+import org.apache.druid.segment.column.NumericColumn;
+
+import javax.annotation.Nullable;
+
+/**
+ */
+public class ColumnSelectorColumnIndexSelector implements ColumnIndexSelector
+{
+ private final BitmapFactory bitmapFactory;
+ private final VirtualColumns virtualColumns;
+ private final ColumnSelector columnSelector;
+
+ public ColumnSelectorColumnIndexSelector(
+ final BitmapFactory bitmapFactory,
+ final VirtualColumns virtualColumns,
+ final ColumnSelector index
+ )
+ {
+ this.bitmapFactory = bitmapFactory;
+ this.virtualColumns = virtualColumns;
+ this.columnSelector = index;
+ }
+
+ @Override
+ public int getNumRows()
+ {
+ try (final NumericColumn column = (NumericColumn)
columnSelector.getColumnHolder(ColumnHolder.TIME_COLUMN_NAME).getColumn()) {
+ return column.length();
+ }
+ }
+
+ @Override
+ public BitmapFactory getBitmapFactory()
+ {
+ return bitmapFactory;
+ }
+
+ @Override
+ public ColumnIndexSupplier getIndexSupplier(String column)
+ {
+ final ColumnIndexSupplier indexSupplier;
+ if (isVirtualColumn(column)) {
+ indexSupplier = virtualColumns.getIndexSupplier(column, columnSelector);
+ } else {
+ final ColumnHolder columnHolder = columnSelector.getColumnHolder(column);
+ if (columnHolder == null ||
!columnHolder.getCapabilities().isFilterable()) {
Review Comment:
Why the `isFilterable` check? If the ColumnHolder believes that it is not
filterable, wouldn't it just return a `null` IndexSupplier? Or do we want a
not filterable column to act like it doesn't even exist?
##########
processing/src/main/java/org/apache/druid/segment/QueryableIndexStorageAdapter.java:
##########
@@ -146,8 +148,9 @@ public Comparable getMinValue(String dimension)
{
ColumnHolder columnHolder = index.getColumnHolder(dimension);
if (columnHolder != null &&
columnHolder.getCapabilities().hasBitmapIndexes()) {
- BitmapIndex bitmap = columnHolder.getBitmapIndex();
- return bitmap.getCardinality() > 0 ? bitmap.getValue(0) : null;
+ ColumnIndexSupplier indexSupplier = columnHolder.getIndexSupplier();
Review Comment:
Total drive-by comment as it's not really related to this PR, but this
`getMinValue` seems like a weird method to have on this API. It looks like
it's used for some join logic? I wonder if that logic couldn't just specialize
to the concrete class it expects and get the method that way instead...
##########
processing/src/main/java/org/apache/druid/segment/column/DictionaryEncodedStringValueIndex.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+
+import javax.annotation.Nullable;
+
+/**
+ * This exposes a 'raw' view into a bitmap value indexes of a string {@link
DictionaryEncodedColumn}, allowing
+ * operation via dictionaryIds, as well as access to lower level details of
such a column like value lookup and
+ * value cardinality.
+ *
+ * This interface should only be used when it is beneficial to operate in such
a manner, most filter implementations
+ * should likely be using {@link StringValueSetIndex} or {@link
LexicographicalRangeIndex} or some other higher level
+ * index instead.
+ */
+public interface DictionaryEncodedStringValueIndex
+{
+ boolean hasNulls();
+ /**
+ * Get the cardinality of the underlying value dictionary
+ */
+ int getCardinality();
+
+ /**
+ * Get the value in the underlying value dictionary of the specified
dictionary id
+ */
+ @Nullable
+ String getValue(int index);
+
+ /**
+ * Returns the index of "value" in this DictionaryEncodedStringValueIndex,
or a negative value, if not present in
+ * the underlying dictionary
+ */
+ int getIndex(@Nullable String value);
+
+ /**
+ * Get the {@link ImmutableBitmap} for dictionary id of the underlying
dictionary
+ */
+ ImmutableBitmap getBitmap(int idx);
Review Comment:
I wonder if this shouldn't return `BitmapColumnIndex` instead of
`ImmutableBitmap`?
##########
processing/src/main/java/org/apache/druid/segment/column/AllTrueBitmapColumnIndex.java:
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.query.BitmapResultFactory;
+import org.apache.druid.query.filter.ColumnIndexSelector;
+
+public class AllTrueBitmapColumnIndex implements BitmapColumnIndex
+{
+ private final ColumnIndexSelector selector;
+
+ public AllTrueBitmapColumnIndex(ColumnIndexSelector indexSelector)
+ {
+ this.selector = indexSelector;
+ }
+
+ @Override
+ public ColumnIndexCapabilities getIndexCapabilities()
+ {
+ return SimpleColumnIndexCapabilities.getConstant();
+ }
+
+ @Override
+ public double estimateSelectivity(int totalRows)
+ {
+ return 1.;
Review Comment:
Personal style thing. I like `return 1D;` better, mostly mentioning just to
make sure that you are aware that `1D;` is also an option so that you can
evaluate according to your own aesthetics.
##########
processing/src/main/java/org/apache/druid/segment/QueryableIndexIndexableAdapter.java:
##########
@@ -383,6 +391,9 @@ public BitmapValues getBitmapValues(String dimension, int
dictId)
}
}
+ /**
+ * this method is only for testing, don't use it for stuff
Review Comment:
Does it really need to exist? Could it be a helper method in a class in the
test package?
##########
processing/src/main/java/org/apache/druid/segment/column/ColumnIndexSupplier.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import javax.annotation.Nullable;
+
+/**
+ * Provides indexes and information about them ({@link
ColumnIndexCapabilities}) for a column. Indexes which satisfy
+ * the {@link org.apache.druid.query.filter.Filter} used in a {@link
org.apache.druid.query.Query}, allow the query
+ * engine to construct {@link org.apache.druid.segment.data.Offset} (and
+ * {@link org.apache.druid.segment.vector.VectorOffset}) to build {@link
org.apache.druid.segment.Cursor}
+ * (and {@link org.apache.druid.segment.vector.VectorCursor}). This allows the
engine to only scan the rows which match
+ * the filter values, instead of performing a full scan of the column and
using a
+ * {@link org.apache.druid.query.filter.ValueMatcher}
+ * (or {@link org.apache.druid.query.filter.vector.VectorValueMatcher}) to
filter the values (or in addition to, in the
+ * case of indexes which {@link ColumnIndexCapabilities#isExact()} is false,
which must still use a value matcher
+ * when using an index).
+ */
+public interface ColumnIndexSupplier
+{
+ /**
+ * Get the {@link ColumnIndexCapabilities} for the specified type of index.
If the index does not exist
+ * this method will return null. A null return value from this method
indicates that an index of the desired type
+ * in unavailable.
+ */
+ @Nullable
+ <T> ColumnIndexCapabilities getIndexCapabilities(Class<T> clazz);
+
+ /**
+ * Get a column 'index' of the specified type. If the index of the desired
type is not available, this method will
+ * return null. If the value is non-null, the index may be used for the
eventual construction of an
+ * {@link org.apache.druid.segment.data.Offset} to form the basis of a
{@link org.apache.druid.segment.Cursor}
+ * (or {@link org.apache.druid.segment.vector.VectorOffset} and {@link
org.apache.druid.segment.vector.VectorCursor})
+ * which can greatly reduce the total number of rows which need to be
scanned and processed.
+ */
+ @Nullable
+ <T> T getIndex(Class<T> clazz);
Review Comment:
TBH, the name "Index" was already ruined by the original author of a lot of
this code. It's so overloaded in what it can mean, that I find myself wanting
us to use a different noun. Upon reflection, my suggestion to add `As` was
actually coming from me not liking the word `Index` and wanting something with
extra words to make it more descriptive.
##########
processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java:
##########
@@ -193,30 +194,36 @@ private ColumnAnalysis analyzeNumericColumn(
private ColumnAnalysis analyzeStringColumn(
final ColumnCapabilities capabilities,
- final ColumnHolder columnHolder
+ final ColumnHolder columnHolder,
+ final String columnName
)
{
Comparable min = null;
Comparable max = null;
long size = 0;
final int cardinality;
if (capabilities.hasBitmapIndexes()) {
- final BitmapIndex bitmapIndex = columnHolder.getBitmapIndex();
- cardinality = bitmapIndex.getCardinality();
-
- if (analyzingSize()) {
- for (int i = 0; i < cardinality; ++i) {
- String value = bitmapIndex.getValue(i);
- if (value != null) {
- size += StringUtils.estimatedBinaryLengthAsUTF8(value) *
- ((long) bitmapIndex.getBitmapForValue(value).size());
+ ColumnIndexSupplier indexSupplier = columnHolder.getIndexSupplier();
+ if (indexSupplier != null) {
+ final DictionaryEncodedStringValueIndex valueIndex =
indexSupplier.getIndex(
+ DictionaryEncodedStringValueIndex.class
+ );
+ Preconditions.checkNotNull(valueIndex, "Column [%s] does not have a
value index", columnName);
+ cardinality = valueIndex.getCardinality();
+ if (analyzingSize()) {
+ for (int i = 0; i < cardinality; ++i) {
+ String value = valueIndex.getValue(i);
Review Comment:
While we are here, could we like... use the `getUTF8Bytes` and just take
`.remaining()` instead?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]