This is an automated email from the ASF dual-hosted git repository.
xiangfu0 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git
The following commit(s) were added to refs/heads/master by this push:
new 51bcec3b88f Generalize RAW + dictionary column fix to all
aggregation/distinct sites (follow-up to #18500) (#18504)
51bcec3b88f is described below
commit 51bcec3b88fe151362ef79bdea1b26515e6ff3ca
Author: Xiang Fu <[email protected]>
AuthorDate: Fri May 15 00:28:10 2026 -0700
Generalize RAW + dictionary column fix to all aggregation/distinct sites
(follow-up to #18500) (#18504)
* Fix UnsupportedOperationException for aggregations/distinct on RAW +
dictionary columns
A column declared with EncodingType.RAW + an explicit dictionaryIndex has a
Dictionary file on disk but a RAW forward index that throws on
ForwardIndexReader#readDictIds. Many aggregation, group-by, and distinct
executors gated their dict-id read path on `blockValSet.getDictionary() !=
null`
alone, so a single such column in a query would crash with
UnsupportedOperationException on the AggregationOperator / group-by path.
This was first fixed in PR #18500 for one site
(NoDictionaryMultiColumnGroupKeyGenerator).
A codebase scan found ~30 more call sites with the same buggy pattern: the
DISTINCTCOUNT/HLL/Bitmap/ULL/CPCSketch family,
SegmentPartitionedDistinctCount,
Mode, AnyValue, FUNNELCOUNT, DistinctExecutorFactory's single- and
multi-column
paths, and DefaultGroupByExecutor (already partially guarded).
Fix: introduce an explicit `boolean isDictionaryEncoded()` on `BlockValSet`
and
`ColumnContext` that returns true only when the forward index is
dict-encoded.
The default `BlockValSet.isDictionaryEncoded()` falls back to
`getDictionary() != null`
so non-projection value sets (transform, row, data-block) keep working
unchanged.
`ProjectionBlockValSet` overrides to consult the forward index directly so
RAW + dictionaryIndex columns correctly report false. `getDictionary()`
keeps
its straightforward "is there a dictionary file?" meaning — filter operators
that hold dict IDs (via DataSource#getDictionary, which is unaffected)
continue
to work.
Every aggregation/distinct/group-by call site now gates on the new flag
rather
than dictionary nullness, so a single helper expresses the rule once and all
30+ sites are consistent.
Regression tests in RawForwardIndexWithDictionaryTest reproduce the crash on
multi-column GROUP BY, multi-column DISTINCT, DISTINCT with filter,
DISTINCTCOUNT
with filter, DISTINCTCOUNTHLL with filter, DISTINCTCOUNTBITMAP,
SEGMENTPARTITIONEDDISTINCTCOUNT with filter, and MODE — all 16 new test runs
fail on master and pass with this change.
Co-Authored-By: Claude Opus 4.7 <[email protected]>
* Address review comments on PR #18504
Themes from raghavyadav01 + Copilot:
1. forwardIndex == null was treated as dict-encoded (`forwardIndex == null
|| forwardIndex.isDictionaryEncoded()`). When the forward index is
disabled (dict + inverted/range only), this returned true and callers
would NPE on getDictionaryIdsSV(). Tightened to
`forwardIndex != null && forwardIndex.isDictionaryEncoded()` in both
ColumnContext.fromDataSource and
ProjectionBlockValSet.isDictionaryEncoded().
2. BlockValSet.getDictionary() Javadoc said "dictionary file", which is
inaccurate for TransformBlockValSet (in-memory). Reworded to "dictionary
(on disk or built on the fly)".
3. Default BlockValSet.isDictionaryEncoded() mirrored the buggy pattern.
Kept default returning getDictionary() != null for SPI compat but
strengthened the Javadoc to call out the trap, and added explicit
overrides on every in-tree impl (TransformBlockValSet,
RowBasedBlockValSet,
FilteredRowBasedBlockValSet, DataBlockValSet, FilteredDataBlockValSet).
4. Added regression test
testDistinctOnTransformOfRawDictColumnReturnsSameResults
covering SELECT DISTINCT UPPER(rawDictDim) — exercises the
TransformBlockValSet
path raghavyadav01 flagged.
5. Reworded stale test docstrings to describe pre-fix behavior as historical
("previously crashed inside ...") and reference isDictionaryEncoded() as
the correct gate.
Co-Authored-By: Claude Opus 4.7 <[email protected]>
---------
Co-authored-by: Claude Opus 4.7 <[email protected]>
---
.../org/apache/pinot/core/common/BlockValSet.java | 22 ++-
.../apache/pinot/core/operator/ColumnContext.java | 35 +++-
.../core/operator/docvalsets/DataBlockValSet.java | 6 +
.../docvalsets/FilteredDataBlockValSet.java | 6 +
.../docvalsets/FilteredRowBasedBlockValSet.java | 6 +
.../operator/docvalsets/ProjectionBlockValSet.java | 19 +++
.../operator/docvalsets/RowBasedBlockValSet.java | 6 +
.../operator/docvalsets/TransformBlockValSet.java | 9 ++
.../function/AnyValueAggregationFunction.java | 2 +-
.../BaseDistinctAggregateAggregationFunction.java | 12 +-
...istinctCountSmartSketchAggregationFunction.java | 4 +-
.../DistinctCountBitmapAggregationFunction.java | 12 +-
.../DistinctCountCPCSketchAggregationFunction.java | 6 +-
.../DistinctCountHLLAggregationFunction.java | 12 +-
.../DistinctCountHLLPlusAggregationFunction.java | 12 +-
.../DistinctCountOffHeapAggregationFunction.java | 2 +-
.../DistinctCountSmartHLLAggregationFunction.java | 2 +-
...stinctCountSmartHLLPlusAggregationFunction.java | 2 +-
.../DistinctCountSmartULLAggregationFunction.java | 2 +-
.../DistinctCountULLAggregationFunction.java | 6 +-
.../function/ModeAggregationFunction.java | 6 +-
...artitionedDistinctCountAggregationFunction.java | 6 +-
.../function/funnel/AggregationStrategy.java | 9 +-
.../groupby/DefaultGroupByExecutor.java | 14 +-
.../NoDictionaryMultiColumnGroupKeyGenerator.java | 29 +---
.../query/distinct/DistinctExecutorFactory.java | 11 +-
.../custom/RawForwardIndexWithDictionaryTest.java | 177 ++++++++++++++++++++-
27 files changed, 338 insertions(+), 97 deletions(-)
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/common/BlockValSet.java
b/pinot-core/src/main/java/org/apache/pinot/core/common/BlockValSet.java
index dad7da5c3bf..728ac06cc71 100644
--- a/pinot-core/src/main/java/org/apache/pinot/core/common/BlockValSet.java
+++ b/pinot-core/src/main/java/org/apache/pinot/core/common/BlockValSet.java
@@ -48,11 +48,31 @@ public interface BlockValSet {
boolean isSingleValue();
/**
- * Returns the dictionary for the column, or {@code null} if the column is
not dictionary-encoded.
+ * Returns the dictionary for the column if one exists, or {@code null}
otherwise. The dictionary may live on disk
+ * (segment-backed columns) or be built on the fly (transform functions). It
may be present even when
+ * {@link #isDictionaryEncoded()} returns {@code false} — a column declared
as {@code EncodingType.RAW} with an
+ * explicit {@code dictionaryIndex} carries a dictionary on disk but a RAW
forward index, and a column with a
+ * disabled forward index has no way to read dict IDs at all. Callers that
select between a dictionary-id read
+ * path ({@link #getDictionaryIdsSV()} / {@link #getDictionaryIdsMV()}) and
a value read path MUST gate on
+ * {@link #isDictionaryEncoded()}, not {@code getDictionary() != null}.
*/
@Nullable
Dictionary getDictionary();
+ /**
+ * Returns {@code true} if the dict-id read path ({@link
#getDictionaryIdsSV()} / {@link #getDictionaryIdsMV()})
+ * is callable on this value set.
+ *
+ * <p>The default implementation falls back to {@code getDictionary() !=
null}, which is correct for value sets
+ * where dictionary presence and dict-id readability are coupled.
Implementers MUST override this whenever the
+ * two can diverge — most notably the segment projection layer, where a
column can declare
+ * {@code EncodingType.RAW} alongside an explicit {@code dictionaryIndex}
(dictionary present, but
+ * {@code readDictIds} throws), or where the forward index is disabled
outright (no forward index to read).
+ */
+ default boolean isDictionaryEncoded() {
+ return getDictionary() != null;
+ }
+
/**
* SINGLE-VALUED COLUMN APIs
*/
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/operator/ColumnContext.java
b/pinot-core/src/main/java/org/apache/pinot/core/operator/ColumnContext.java
index 878ef0b3f95..a94e3a79f3d 100644
--- a/pinot-core/src/main/java/org/apache/pinot/core/operator/ColumnContext.java
+++ b/pinot-core/src/main/java/org/apache/pinot/core/operator/ColumnContext.java
@@ -24,6 +24,7 @@ import
org.apache.pinot.core.operator.transform.function.TransformFunction;
import org.apache.pinot.segment.spi.datasource.DataSource;
import org.apache.pinot.segment.spi.datasource.DataSourceMetadata;
import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.apache.pinot.segment.spi.index.reader.ForwardIndexReader;
import org.apache.pinot.spi.data.FieldSpec.DataType;
@@ -31,13 +32,15 @@ public class ColumnContext {
private final DataType _dataType;
private final boolean _isSingleValue;
private final Dictionary _dictionary;
+ private final boolean _dictionaryEncoded;
private final DataSource _dataSource;
private ColumnContext(DataType dataType, boolean isSingleValue, @Nullable
Dictionary dictionary,
- @Nullable DataSource dataSource) {
+ boolean dictionaryEncoded, @Nullable DataSource dataSource) {
_dataType = dataType;
_isSingleValue = isSingleValue;
_dictionary = dictionary;
+ _dictionaryEncoded = dictionaryEncoded;
_dataSource = dataSource;
}
@@ -49,11 +52,24 @@ public class ColumnContext {
return _isSingleValue;
}
+ /// Returns the column's dictionary file if one exists, regardless of
whether the forward index can answer
+ /// dictionary-id reads. Callers that need to select between a dict-id read
path and a value read path MUST gate
+ /// on {@link #isDictionaryEncoded()} rather than {@code getDictionary() !=
null} — a column declared as
+ /// {@code EncodingType.RAW} with an explicit {@code dictionaryIndex}
returns a non-null dictionary here but its
+ /// forward index throws on {@link ForwardIndexReader#readDictIds}.
@Nullable
public Dictionary getDictionary() {
return _dictionary;
}
+ /// Returns {@code true} if the column's forward index is dictionary-encoded
and the dict-id read path
+ /// ({@link org.apache.pinot.core.common.BlockValSet#getDictionaryIdsSV()})
is callable. A column with
+ /// {@code EncodingType.RAW} + an explicit {@code dictionaryIndex} returns
{@code false} here even though
+ /// {@link #getDictionary()} is non-null.
+ public boolean isDictionaryEncoded() {
+ return _dictionaryEncoded;
+ }
+
@Nullable
public DataSource getDataSource() {
return _dataSource;
@@ -61,13 +77,22 @@ public class ColumnContext {
public static ColumnContext fromDataSource(DataSource dataSource) {
DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
- return new ColumnContext(dataSourceMetadata.getDataType(),
dataSourceMetadata.isSingleValue(),
- dataSource.getDictionary(), dataSource);
+ Dictionary dictionary = dataSource.getDictionary();
+ ForwardIndexReader<?> forwardIndex = dataSource.getForwardIndex();
+ // Dict-id reads require both a dictionary AND a dict-encoded forward
index. A column with EncodingType.RAW +
+ // dictionaryIndex has the dictionary but a RAW forward index; a column
with a disabled forward index (dict +
+ // inverted/range only) has no forward index at all. Both must take the
value/index-based path.
+ boolean dictEncoded = dictionary != null && forwardIndex != null &&
forwardIndex.isDictionaryEncoded();
+ return new ColumnContext(dataSourceMetadata.getDataType(),
dataSourceMetadata.isSingleValue(), dictionary,
+ dictEncoded, dataSource);
}
public static ColumnContext fromTransformFunction(TransformFunction
transformFunction) {
TransformResultMetadata resultMetadata =
transformFunction.getResultMetadata();
- return new ColumnContext(resultMetadata.getDataType(),
resultMetadata.isSingleValue(),
- transformFunction.getDictionary(), null);
+ Dictionary dictionary = transformFunction.getDictionary();
+ // Transform functions that expose a dictionary always build it
themselves, so the dict-id read path is callable
+ // whenever the dictionary is present.
+ return new ColumnContext(resultMetadata.getDataType(),
resultMetadata.isSingleValue(), dictionary,
+ dictionary != null, null);
}
}
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/DataBlockValSet.java
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/DataBlockValSet.java
index a8bc631b674..37c928c3ac0 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/DataBlockValSet.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/DataBlockValSet.java
@@ -74,6 +74,12 @@ public class DataBlockValSet implements BlockValSet {
return null;
}
+ /// Data-block value sets never carry a dictionary; the dict-id read methods
below always throw.
+ @Override
+ public boolean isDictionaryEncoded() {
+ return false;
+ }
+
@Override
public int[] getDictionaryIdsSV() {
throw new UnsupportedOperationException();
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/FilteredDataBlockValSet.java
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/FilteredDataBlockValSet.java
index 2114bb6f1e6..959248f9081 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/FilteredDataBlockValSet.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/FilteredDataBlockValSet.java
@@ -97,6 +97,12 @@ public class FilteredDataBlockValSet implements BlockValSet {
return null;
}
+ /// Data-block value sets never carry a dictionary; the dict-id read methods
below always throw.
+ @Override
+ public boolean isDictionaryEncoded() {
+ return false;
+ }
+
@Override
public int[] getDictionaryIdsSV() {
throw new UnsupportedOperationException();
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/FilteredRowBasedBlockValSet.java
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/FilteredRowBasedBlockValSet.java
index ba107a8b9fe..e889ae509ef 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/FilteredRowBasedBlockValSet.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/FilteredRowBasedBlockValSet.java
@@ -94,6 +94,12 @@ public class FilteredRowBasedBlockValSet implements
BlockValSet {
return null;
}
+ /// Row-based value sets never carry a dictionary; the dict-id read methods
below always throw.
+ @Override
+ public boolean isDictionaryEncoded() {
+ return false;
+ }
+
@Override
public int[] getDictionaryIdsSV() {
throw new UnsupportedOperationException();
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java
index e0613a4b84f..5bd14e4565d 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java
@@ -25,6 +25,7 @@ import org.apache.pinot.core.common.DataBlockCache;
import org.apache.pinot.core.operator.ProjectionOperator;
import org.apache.pinot.segment.spi.datasource.DataSource;
import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.apache.pinot.segment.spi.index.reader.ForwardIndexReader;
import org.apache.pinot.segment.spi.index.reader.NullValueVectorReader;
import org.apache.pinot.spi.data.FieldSpec.DataType;
import org.apache.pinot.spi.trace.InvocationRecording;
@@ -98,6 +99,24 @@ public class ProjectionBlockValSet implements BlockValSet {
return _dataSource.getDictionary();
}
+ /// Returns {@code true} only when there is both a dictionary AND a
dict-encoded forward index. Two cases return
+ /// {@code false} even though {@link #getDictionary()} is non-null:
+ /// <ul>
+ /// <li>{@code EncodingType.RAW} + an explicit {@code dictionaryIndex}:
the forward index throws on
+ /// {@link ForwardIndexReader#readDictIds}.</li>
+ /// <li>Disabled forward index (dict + inverted/range only): there is no
forward index to read dict IDs from.</li>
+ /// </ul>
+ /// Callers selecting between dict-id and value paths must gate on this
method, not {@code getDictionary() != null}.
+ @Override
+ public boolean isDictionaryEncoded() {
+ Dictionary dictionary = _dataSource.getDictionary();
+ if (dictionary == null) {
+ return false;
+ }
+ ForwardIndexReader<?> forwardIndex = _dataSource.getForwardIndex();
+ return forwardIndex != null && forwardIndex.isDictionaryEncoded();
+ }
+
@Override
public int[] getDictionaryIdsSV() {
try (InvocationScope scope =
Tracing.getTracer().createScope(ProjectionBlockValSet.class)) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/RowBasedBlockValSet.java
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/RowBasedBlockValSet.java
index 9d204d70062..20394b3d7c5 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/RowBasedBlockValSet.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/RowBasedBlockValSet.java
@@ -96,6 +96,12 @@ public class RowBasedBlockValSet implements BlockValSet {
return null;
}
+ /// Row-based value sets never carry a dictionary; the dict-id read methods
below always throw.
+ @Override
+ public boolean isDictionaryEncoded() {
+ return false;
+ }
+
@Override
public int[] getDictionaryIdsSV() {
throw new UnsupportedOperationException();
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/TransformBlockValSet.java
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/TransformBlockValSet.java
index b8d77415725..360b35ab197 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/TransformBlockValSet.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/TransformBlockValSet.java
@@ -79,6 +79,15 @@ public class TransformBlockValSet implements BlockValSet {
return _transformFunction.getDictionary();
}
+ /// A transform function that exposes a dictionary always builds it itself
(e.g.,
+ /// {@link
org.apache.pinot.core.operator.transform.function.IdentifierTransformFunction}
only exposes the
+ /// underlying column's dictionary when its forward index is dict-encoded),
so the dict-id read path is callable
+ /// whenever the dictionary is present.
+ @Override
+ public boolean isDictionaryEncoded() {
+ return _transformFunction.getDictionary() != null;
+ }
+
@Override
public int[] getDictionaryIdsSV() {
try (InvocationScope scope =
Tracing.getTracer().createScope(TransformBlockValSet.class)) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AnyValueAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AnyValueAggregationFunction.java
index dac8774ee46..26723347874 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AnyValueAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AnyValueAggregationFunction.java
@@ -183,7 +183,7 @@ public class AnyValueAggregationFunction extends
NullableSingleInputAggregationF
*/
private void aggregateHelper(int length, BlockValSet bvs,
ValueProcessor<Object> processor) {
// Use dictionary-based access for efficiency when available
- if (bvs.getDictionary() != null) {
+ if (bvs.isDictionaryEncoded()) {
final int[] dictIds = bvs.getDictionaryIdsSV();
final Dictionary dict = bvs.getDictionary();
forEachNotNull(length, bvs, (from, to) -> {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctAggregateAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctAggregateAggregationFunction.java
index 25a189c77d7..4becf978571 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctAggregateAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctAggregateAggregationFunction.java
@@ -199,7 +199,7 @@ public abstract class
BaseDistinctAggregateAggregationFunction<T extends Compara
*/
protected void svAggregate(BlockValSet blockValSet, int length,
AggregationResultHolder aggregationResultHolder) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
RoaringBitmap dictIdBitmap = getDictIdBitmap(aggregationResultHolder,
dictionary);
@@ -293,7 +293,7 @@ public abstract class
BaseDistinctAggregateAggregationFunction<T extends Compara
*/
protected void mvAggregate(BlockValSet blockValSet, int length,
AggregationResultHolder aggregationResultHolder) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
RoaringBitmap dictIdBitmap = getDictIdBitmap(aggregationResultHolder,
dictionary);
int[][] dictIds = blockValSet.getDictionaryIdsMV();
@@ -407,7 +407,7 @@ public abstract class
BaseDistinctAggregateAggregationFunction<T extends Compara
protected void svAggregateGroupBySV(BlockValSet blockValSet, int length,
int[] groupKeyArray,
GroupByResultHolder groupByResultHolder) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
@@ -501,7 +501,7 @@ public abstract class
BaseDistinctAggregateAggregationFunction<T extends Compara
protected void mvAggregateGroupBySV(BlockValSet blockValSet, int length,
int[] groupKeyArray,
GroupByResultHolder groupByResultHolder) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[][] dictIds = blockValSet.getDictionaryIdsMV();
forEachNotNull(length, blockValSet, (from, to) -> {
@@ -619,7 +619,7 @@ public abstract class
BaseDistinctAggregateAggregationFunction<T extends Compara
protected void svAggregateGroupByMV(BlockValSet blockValSet, int length,
int[][] groupKeysArray,
GroupByResultHolder groupByResultHolder) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
@@ -709,7 +709,7 @@ public abstract class
BaseDistinctAggregateAggregationFunction<T extends Compara
protected void mvAggregateGroupByMV(BlockValSet blockValSet, int length,
int[][] groupKeysArray,
GroupByResultHolder groupByResultHolder) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[][] dictIds = blockValSet.getDictionaryIdsMV();
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctCountSmartSketchAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctCountSmartSketchAggregationFunction.java
index c07fbe61283..52a7351fd35 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctCountSmartSketchAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctCountSmartSketchAggregationFunction.java
@@ -210,7 +210,7 @@ abstract class
BaseDistinctCountSmartSketchAggregationFunction
Map<ExpressionContext, BlockValSet> blockValSetMap) {
BlockValSet blockValSet = blockValSetMap.get(_expression);
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
// Track which groups were modified to check cardinality only once per
group per batch
IntSet modifiedGroups = new IntOpenHashSet();
@@ -347,7 +347,7 @@ abstract class
BaseDistinctCountSmartSketchAggregationFunction
Map<ExpressionContext, BlockValSet> blockValSetMap) {
BlockValSet blockValSet = blockValSetMap.get(_expression);
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
// Track which groups were modified to check cardinality only once per
group per batch
IntSet modifiedGroups = new IntOpenHashSet();
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountBitmapAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountBitmapAggregationFunction.java
index 8a6801454b6..74a7cdd6e2c 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountBitmapAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountBitmapAggregationFunction.java
@@ -102,7 +102,7 @@ public class DistinctCountBitmapAggregationFunction extends
BaseSingleInputAggre
protected void aggregateSV(int length, AggregationResultHolder
aggregationResultHolder, BlockValSet blockValSet,
DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
getDictIdBitmap(aggregationResultHolder, dictionary).addN(dictIds, 0,
length);
@@ -149,7 +149,7 @@ public class DistinctCountBitmapAggregationFunction extends
BaseSingleInputAggre
protected void aggregateMV(int length, AggregationResultHolder
aggregationResultHolder, BlockValSet blockValSet,
DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
RoaringBitmap dictIdBitmap = getDictIdBitmap(aggregationResultHolder,
dictionary);
int[][] dictIds = blockValSet.getDictionaryIdsMV();
@@ -238,7 +238,7 @@ public class DistinctCountBitmapAggregationFunction extends
BaseSingleInputAggre
protected void aggregateSVGroupBySV(int length, int[] groupKeyArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
@@ -288,7 +288,7 @@ public class DistinctCountBitmapAggregationFunction extends
BaseSingleInputAggre
protected void aggregateMVGroupBySV(int length, int[] groupKeyArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[][] dictIds = blockValSet.getDictionaryIdsMV();
for (int i = 0; i < length; i++) {
@@ -381,7 +381,7 @@ public class DistinctCountBitmapAggregationFunction extends
BaseSingleInputAggre
protected void aggregateSVGroupByMV(int length, int[][] groupKeysArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
@@ -431,7 +431,7 @@ public class DistinctCountBitmapAggregationFunction extends
BaseSingleInputAggre
protected void aggregateMVGroupByMV(int length, int[][] groupKeysArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[][] dictIds = blockValSet.getDictionaryIdsMV();
for (int i = 0; i < length; i++) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountCPCSketchAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountCPCSketchAggregationFunction.java
index 49862ffd371..fd2b40395f8 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountCPCSketchAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountCPCSketchAggregationFunction.java
@@ -157,7 +157,7 @@ public class DistinctCountCPCSketchAggregationFunction
}
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
getDictIdBitmap(aggregationResultHolder, dictionary).addN(dictIds, 0,
length);
@@ -229,7 +229,7 @@ public class DistinctCountCPCSketchAggregationFunction
}
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
@@ -302,7 +302,7 @@ public class DistinctCountCPCSketchAggregationFunction
}
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLAggregationFunction.java
index 393ce439cbd..2464a48379a 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLAggregationFunction.java
@@ -114,7 +114,7 @@ public class DistinctCountHLLAggregationFunction extends
BaseSingleInputAggregat
protected void aggregateSV(int length, AggregationResultHolder
aggregationResultHolder, BlockValSet blockValSet,
DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
getDictIdBitmap(aggregationResultHolder, dictionary).addN(dictIds, 0,
length);
@@ -162,7 +162,7 @@ public class DistinctCountHLLAggregationFunction extends
BaseSingleInputAggregat
protected void aggregateMV(int length, AggregationResultHolder
aggregationResultHolder, BlockValSet blockValSet,
DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
RoaringBitmap dictIdBitmap = getDictIdBitmap(aggregationResultHolder,
dictionary);
int[][] dictIds = blockValSet.getDictionaryIdsMV();
@@ -256,7 +256,7 @@ public class DistinctCountHLLAggregationFunction extends
BaseSingleInputAggregat
protected void aggregateSVGroupBySV(int length, int[] groupKeyArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
@@ -305,7 +305,7 @@ public class DistinctCountHLLAggregationFunction extends
BaseSingleInputAggregat
protected void aggregateMVGroupBySV(int length, int[] groupKeyArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[][] dictIds = blockValSet.getDictionaryIdsMV();
for (int i = 0; i < length; i++) {
@@ -405,7 +405,7 @@ public class DistinctCountHLLAggregationFunction extends
BaseSingleInputAggregat
protected void aggregateSVGroupByMV(int length, int[][] groupKeysArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
@@ -454,7 +454,7 @@ public class DistinctCountHLLAggregationFunction extends
BaseSingleInputAggregat
protected void aggregateMVGroupByMV(int length, int[][] groupKeysArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[][] dictIds = blockValSet.getDictionaryIdsMV();
for (int i = 0; i < length; i++) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLPlusAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLPlusAggregationFunction.java
index 67cf83a579f..d839d4b1dec 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLPlusAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLPlusAggregationFunction.java
@@ -124,7 +124,7 @@ public class DistinctCountHLLPlusAggregationFunction
extends BaseSingleInputAggr
protected void aggregateSV(int length, AggregationResultHolder
aggregationResultHolder, BlockValSet blockValSet,
DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
getDictIdBitmap(aggregationResultHolder, dictionary).addN(dictIds, 0,
length);
@@ -173,7 +173,7 @@ public class DistinctCountHLLPlusAggregationFunction
extends BaseSingleInputAggr
protected void aggregateMV(int length, AggregationResultHolder
aggregationResultHolder, BlockValSet blockValSet,
DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
RoaringBitmap dictIdBitmap = getDictIdBitmap(aggregationResultHolder,
dictionary);
int[][] dictIds = blockValSet.getDictionaryIdsMV();
@@ -268,7 +268,7 @@ public class DistinctCountHLLPlusAggregationFunction
extends BaseSingleInputAggr
protected void aggregateSVGroupBySV(int length, int[] groupKeyArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
@@ -318,7 +318,7 @@ public class DistinctCountHLLPlusAggregationFunction
extends BaseSingleInputAggr
protected void aggregateMVGroupBySV(int length, int[] groupKeyArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[][] dictIds = blockValSet.getDictionaryIdsMV();
for (int i = 0; i < length; i++) {
@@ -419,7 +419,7 @@ public class DistinctCountHLLPlusAggregationFunction
extends BaseSingleInputAggr
protected void aggregateSVGroupByMV(int length, int[][] groupKeysArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
@@ -469,7 +469,7 @@ public class DistinctCountHLLPlusAggregationFunction
extends BaseSingleInputAggr
protected void aggregateMVGroupByMV(int length, int[][] groupKeysArray,
GroupByResultHolder groupByResultHolder,
BlockValSet blockValSet, DataType storedType) {
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[][] dictIds = blockValSet.getDictionaryIdsMV();
for (int i = 0; i < length; i++) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountOffHeapAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountOffHeapAggregationFunction.java
index 19d208a4342..c8c5fe88cca 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountOffHeapAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountOffHeapAggregationFunction.java
@@ -82,7 +82,7 @@ public class DistinctCountOffHeapAggregationFunction
public void aggregate(int length, AggregationResultHolder
aggregationResultHolder,
Map<ExpressionContext, BlockValSet> blockValSetMap) {
BlockValSet blockValSet = blockValSetMap.get(_expression);
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
// For dictionary-encoded expression, store dictionary ids into the
bitmap
if (blockValSet.isSingleValue()) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartHLLAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartHLLAggregationFunction.java
index 8b18801f485..49b1df721b9 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartHLLAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartHLLAggregationFunction.java
@@ -100,7 +100,7 @@ public class DistinctCountSmartHLLAggregationFunction
extends BaseDistinctCountS
BlockValSet blockValSet = blockValSetMap.get(_expression);
// For dictionary-encoded expression, use adaptive conversion strategy
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
Object result = aggregationResultHolder.getResult();
// If already converted to HLL, aggregate directly
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartHLLPlusAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartHLLPlusAggregationFunction.java
index cca0d5b143b..52001ae4e5f 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartHLLPlusAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartHLLPlusAggregationFunction.java
@@ -100,7 +100,7 @@ public class DistinctCountSmartHLLPlusAggregationFunction
extends BaseDistinctCo
BlockValSet blockValSet = blockValSetMap.get(_expression);
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
RoaringBitmap dictIdBitmap = getDictIdBitmap(aggregationResultHolder,
dictionary);
if (blockValSet.isSingleValue()) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartULLAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartULLAggregationFunction.java
index b87c299dc6e..3ad214b948e 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartULLAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountSmartULLAggregationFunction.java
@@ -97,7 +97,7 @@ public class DistinctCountSmartULLAggregationFunction extends
BaseDistinctCountS
BlockValSet blockValSet = blockValSetMap.get(_expression);
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
RoaringBitmap dictIdBitmap = getDictIdBitmap(aggregationResultHolder,
dictionary);
if (blockValSet.isSingleValue()) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountULLAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountULLAggregationFunction.java
index 753d3ef69a7..ba2d4aa1704 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountULLAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountULLAggregationFunction.java
@@ -104,7 +104,7 @@ public class DistinctCountULLAggregationFunction extends
BaseSingleInputAggregat
}
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
getDictIdBitmap(aggregationResultHolder, dictionary).addN(dictIds, 0,
length);
@@ -177,7 +177,7 @@ public class DistinctCountULLAggregationFunction extends
BaseSingleInputAggregat
}
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
@@ -259,7 +259,7 @@ public class DistinctCountULLAggregationFunction extends
BaseSingleInputAggregat
}
// For dictionary-encoded expression, store dictionary ids into the bitmap
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/ModeAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/ModeAggregationFunction.java
index 0ddb46bef5c..3b369507e51 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/ModeAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/ModeAggregationFunction.java
@@ -264,7 +264,7 @@ public class ModeAggregationFunction
BlockValSet blockValSet = blockValSetMap.get(_expression);
// For dictionary-encoded expression, store dictionary ids into the dictId
map
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
Int2IntOpenHashMap dictIdValueMap =
getDictIdCountMap(aggregationResultHolder, dictionary);
@@ -328,7 +328,7 @@ public class ModeAggregationFunction
BlockValSet blockValSet = blockValSetMap.get(_expression);
// For dictionary-encoded expression, store dictionary ids into the dictId
map
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
forEachNotNull(length, blockValSet, (from, to) -> {
@@ -386,7 +386,7 @@ public class ModeAggregationFunction
BlockValSet blockValSet = blockValSetMap.get(_expression);
// For dictionary-encoded expression, store dictionary ids into the dictId
map
- Dictionary dictionary = blockValSet.getDictionary();
+ Dictionary dictionary = blockValSet.isDictionaryEncoded() ?
blockValSet.getDictionary() : null;
if (dictionary != null) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
forEachNotNull(length, blockValSet, (from, to) -> {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/SegmentPartitionedDistinctCountAggregationFunction.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/SegmentPartitionedDistinctCountAggregationFunction.java
index 06ca46c7be5..82c8bf8319e 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/SegmentPartitionedDistinctCountAggregationFunction.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/SegmentPartitionedDistinctCountAggregationFunction.java
@@ -74,7 +74,7 @@ public class
SegmentPartitionedDistinctCountAggregationFunction extends BaseSing
BlockValSet blockValSet = blockValSetMap.get(_expression);
// For dictionary-encoded expression, store dictionary ids into a
RoaringBitmap
- if (blockValSet.getDictionary() != null) {
+ if (blockValSet.isDictionaryEncoded()) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
RoaringBitmap bitmap = aggregationResultHolder.getResult();
if (bitmap == null) {
@@ -165,7 +165,7 @@ public class
SegmentPartitionedDistinctCountAggregationFunction extends BaseSing
BlockValSet blockValSet = blockValSetMap.get(_expression);
// For dictionary-encoded expression, store dictionary ids into a
RoaringBitmap
- if (blockValSet.getDictionary() != null) {
+ if (blockValSet.isDictionaryEncoded()) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
setIntValueForGroup(groupByResultHolder, groupKeyArray[i], dictIds[i]);
@@ -224,7 +224,7 @@ public class
SegmentPartitionedDistinctCountAggregationFunction extends BaseSing
BlockValSet blockValSet = blockValSetMap.get(_expression);
// For dictionary-encoded expression, store dictionary ids into a
RoaringBitmap
- if (blockValSet.getDictionary() != null) {
+ if (blockValSet.isDictionaryEncoded()) {
int[] dictIds = blockValSet.getDictionaryIdsSV();
for (int i = 0; i < length; i++) {
int dictId = dictIds[i];
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/funnel/AggregationStrategy.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/funnel/AggregationStrategy.java
index 298fd4a8052..99006c102ab 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/funnel/AggregationStrategy.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/funnel/AggregationStrategy.java
@@ -145,11 +145,14 @@ public abstract class AggregationStrategy<A> {
abstract void add(Dictionary dictionary, A aggResult, int step, int
correlationId);
private Dictionary getDictionary(Map<ExpressionContext, BlockValSet>
blockValSetMap) {
- final Dictionary primaryCorrelationDictionary =
blockValSetMap.get(_primaryCorrelationCol).getDictionary();
- Preconditions.checkArgument(primaryCorrelationDictionary != null,
+ final BlockValSet primaryCorrelationValSet =
blockValSetMap.get(_primaryCorrelationCol);
+ // FUNNELCOUNT requires dict-id reads from the forward index; a column
with EncodingType.RAW + dictionaryIndex
+ // exposes a Dictionary but BlockValSet#getDictionaryIdsSV throws on the
RAW forward index. Gate on the
+ // explicit forward-index encoding flag rather than dictionary nullness
alone.
+ Preconditions.checkArgument(primaryCorrelationValSet.isDictionaryEncoded(),
"CORRELATE_BY column in FUNNELCOUNT aggregation function not
supported, please use a dictionary encoded "
+ "column.");
- return primaryCorrelationDictionary;
+ return primaryCorrelationValSet.getDictionary();
}
private int[] getCorrelationIds(Map<ExpressionContext, BlockValSet>
blockValSetMap) {
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
index 15cecadd3f7..d5af6d9eae7 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
@@ -87,16 +87,10 @@ public class DefaultGroupByExecutor implements
GroupByExecutor {
for (ExpressionContext groupByExpression : groupByExpressions) {
ColumnContext columnContext =
projectOperator.getResultColumnContext(groupByExpression);
hasMVGroupByExpression |= !columnContext.isSingleValue();
- // DictionaryBasedGroupKeyGenerator does dict-id reads from the forward
index — that requires the
- // forward index to actually be dict-encoded. Columns with a shared
dictionary on a RAW forward index
- // (dict file exists but forward stores raw values) would otherwise be
misrouted into the dict-id
- // path; gate on forward-index encoding so they take the no-dict GROUP
BY path instead.
- // ColumnContext.getDataSource() is null for computed (non-identifier)
transforms; in that case
- // getDictionary() == null already covers them via the first condition.
- hasNoDictionaryGroupByExpression |= columnContext.getDictionary() == null
- || (columnContext.getDataSource() != null
- && columnContext.getDataSource().getForwardIndex() != null
- &&
!columnContext.getDataSource().getForwardIndex().isDictionaryEncoded());
+ // A column with EncodingType.RAW + explicit dictionaryIndex has a
non-null dictionary but a RAW forward
+ // index that throws on readDictIds; route those through the no-dict
GROUP BY generator via the explicit
+ // isDictionaryEncoded() flag rather than gating on dictionary nullness
alone.
+ hasNoDictionaryGroupByExpression |= !columnContext.isDictionaryEncoded();
}
_hasMVGroupByExpression = hasMVGroupByExpression;
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionaryMultiColumnGroupKeyGenerator.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionaryMultiColumnGroupKeyGenerator.java
index 8f588b49d9e..51e4c7fec66 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionaryMultiColumnGroupKeyGenerator.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionaryMultiColumnGroupKeyGenerator.java
@@ -32,9 +32,7 @@ import org.apache.pinot.core.operator.ColumnContext;
import org.apache.pinot.core.operator.blocks.ValueBlock;
import org.apache.pinot.core.query.aggregation.groupby.utils.ValueToIdMap;
import
org.apache.pinot.core.query.aggregation.groupby.utils.ValueToIdMapFactory;
-import org.apache.pinot.segment.spi.datasource.DataSource;
import org.apache.pinot.segment.spi.index.reader.Dictionary;
-import org.apache.pinot.segment.spi.index.reader.ForwardIndexReader;
import org.apache.pinot.spi.data.FieldSpec.DataType;
import org.apache.pinot.spi.utils.ByteArray;
import org.apache.pinot.spi.utils.FixedIntArray;
@@ -80,14 +78,11 @@ public class NoDictionaryMultiColumnGroupKeyGenerator
implements GroupKeyGenerat
ExpressionContext groupByExpression = groupByExpressions[i];
ColumnContext columnContext =
projectOperator.getResultColumnContext(groupByExpression);
_storedTypes[i] = columnContext.getDataType().getStoredType();
- // Only take the dict-id path when the column has a dictionary AND its
forward index is dict-encoded.
- // A column can have a dictionary alongside a RAW forward index (e.g.
dict + inverted/range), in which case
- // BlockValSet#getDictionaryIdsSV would route to
ForwardIndexReader#readDictIds and throw on the raw forward
- // index. Fall back to an on-the-fly dictionary on raw values instead.
- Dictionary dictionary = _nullHandlingEnabled ? null :
columnContext.getDictionary();
- if (dictionary != null && !hasDictEncodedForwardIndex(columnContext)) {
- dictionary = null;
- }
+ // Take the dict-id path only when the forward index is dict-encoded. A
column with EncodingType.RAW +
+ // dictionaryIndex exposes a Dictionary but
BlockValSet#getDictionaryIdsSV throws on its RAW forward
+ // index — fall back to an on-the-fly dictionary on raw values for that
case.
+ Dictionary dictionary = _nullHandlingEnabled ||
!columnContext.isDictionaryEncoded() ? null
+ : columnContext.getDictionary();
if (dictionary != null) {
_dictionaries[i] = dictionary;
} else {
@@ -437,20 +432,6 @@ public class NoDictionaryMultiColumnGroupKeyGenerator
implements GroupKeyGenerat
return new GroupKeyIterator();
}
- /**
- * Returns {@code true} if the column has a dict-encoded forward index, i.e.
{@link BlockValSet#getDictionaryIdsSV}
- * is callable. A column referenced by a transform (rather than directly)
has no underlying {@link DataSource}; in
- * that case the transform builds its own dictionary on the fly, so the
dict-id path is always usable.
- */
- private static boolean hasDictEncodedForwardIndex(ColumnContext
columnContext) {
- DataSource dataSource = columnContext.getDataSource();
- if (dataSource == null) {
- return true;
- }
- ForwardIndexReader<?> forwardIndex = dataSource.getForwardIndex();
- return forwardIndex == null || forwardIndex.isDictionaryEncoded();
- }
-
/**
* Helper method to get or create group-id for a group key.
*
diff --git
a/pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctExecutorFactory.java
b/pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctExecutorFactory.java
index 4b9bff8cab8..60f7be22de7 100644
---
a/pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctExecutorFactory.java
+++
b/pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctExecutorFactory.java
@@ -72,8 +72,10 @@ public class DistinctExecutorFactory {
} else {
orderByExpression = null;
}
- Dictionary dictionary = columnContext.getDictionary();
+ // Use the dict-id-based executor only when the forward index is
dict-encoded (RAW + dictionaryIndex columns
+ // expose a Dictionary but their forward index throws on readDictIds —
gate on isDictionaryEncoded()).
// Note: Use raw value based when ordering is needed and dictionary is
not sorted (consuming segments).
+ Dictionary dictionary = columnContext.isDictionaryEncoded() ?
columnContext.getDictionary() : null;
if (dictionary != null && (orderByExpression == null ||
dictionary.isSorted())) {
// Dictionary based
return new DictionaryBasedSingleColumnDistinctExecutor(expression,
dictionary, dataType, limit,
@@ -115,9 +117,10 @@ public class DistinctExecutorFactory {
columnNames[i] = expression.toString();
columnDataTypes[i] =
ColumnDataType.fromDataTypeSV(columnContext.getDataType());
if (dictionaryBased) {
- Dictionary dictionary = columnContext.getDictionary();
- if (dictionary != null) {
- dictionaries.add(dictionary);
+ // RAW + dictionaryIndex columns expose a Dictionary but the forward
index throws on readDictIds; gate
+ // the dict-id-based multi-column executor on the explicit
forward-index encoding flag.
+ if (columnContext.isDictionaryEncoded()) {
+ dictionaries.add(columnContext.getDictionary());
} else {
dictionaryBased = false;
}
diff --git
a/pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/RawForwardIndexWithDictionaryTest.java
b/pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/RawForwardIndexWithDictionaryTest.java
index 2fa08e27adf..93c4703dfcb 100644
---
a/pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/RawForwardIndexWithDictionaryTest.java
+++
b/pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/RawForwardIndexWithDictionaryTest.java
@@ -517,13 +517,12 @@ public class RawForwardIndexWithDictionaryTest extends
CustomDataQueryClusterInt
assertEquals(rawRows, dictRows, "DISTINCT rows must match between
dictionary-only and raw+dictionary columns");
}
- /**
- * Multi-column GROUP BY that mixes a dict-encoded column with a
RAW+dictionary column. This forces the executor
- * onto the {@code NoDictionaryMultiColumnGroupKeyGenerator} path. The
per-column branch there must check the
- * forward-index encoding in addition to {@code ColumnContext#getDictionary}
— otherwise it keeps the dictionary
- * for any column that has a dict file and calls {@code
BlockValSet#getDictionaryIdsSV()} on it, which routes to
- * {@code ForwardIndexReader#readDictIds} and throws {@code
UnsupportedOperationException} on a RAW forward index.
- */
+ /// Multi-column GROUP BY that mixes a dict-encoded column with a
RAW+dictionary column. Forces the executor onto
+ /// the {@link
org.apache.pinot.core.query.aggregation.groupby.NoDictionaryMultiColumnGroupKeyGenerator}
path.
+ /// Before the {@code ColumnContext.isDictionaryEncoded()} gate, the
per-column branch there picked the dict-id
+ /// path whenever {@code ColumnContext#getDictionary() != null} and then
called
+ /// {@code BlockValSet#getDictionaryIdsSV()} on the RAW forward index, which
throws
+ /// {@code UnsupportedOperationException}.
@Test(dataProvider = "useBothQueryEngines")
public void
testMultiColumnGroupByWithRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
throws Exception {
@@ -542,6 +541,170 @@ public class RawForwardIndexWithDictionaryTest extends
CustomDataQueryClusterInt
"Multi-column GROUP BY rows must match between dictionary-only and
raw+dictionary columns");
}
+ /// Multi-column DISTINCT exercises {@link
org.apache.pinot.core.query.distinct.DistinctExecutorFactory}'s
+ /// multi-column path. Before the {@code
ColumnContext.isDictionaryEncoded()} gate, the factory routed to
+ /// {@code DictionaryBasedMultiColumnDistinctExecutor} whenever every column
had a non-null dictionary, then
+ /// that executor called {@code BlockValSet#getDictionaryIdsSV()} — which
throws on a RAW+dictionary column.
+ @Test(dataProvider = "useBothQueryEngines")
+ public void
testMultiColumnDistinctWithRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+ JsonNode dictRows = postQuery(
+ String.format("SELECT DISTINCT %s, %s FROM %s ORDER BY %s, %s",
+ DICT_DIMENSION, DICT_INT_DIMENSION, getTableName(),
DICT_DIMENSION, DICT_INT_DIMENSION))
+ .get("resultTable").get("rows");
+ JsonNode rawRows = postQuery(
+ String.format("SELECT DISTINCT %s, %s FROM %s ORDER BY %s, %s",
+ RAW_DICT_DIMENSION, RAW_DICT_INT_DIMENSION, getTableName(),
+ RAW_DICT_DIMENSION, RAW_DICT_INT_DIMENSION))
+ .get("resultTable").get("rows");
+ assertEquals(rawRows, dictRows,
+ "Multi-column DISTINCT rows must match between dictionary-only and
raw+dictionary columns");
+ }
+
+ /// {@code DISTINCTCOUNT} on a RAW+dictionary column was previously crashing
inside
+ /// {@link
org.apache.pinot.core.query.aggregation.function.BaseDistinctAggregateAggregationFunction#svAggregate}:
+ /// the executor entered the dict-id path whenever {@code
blockValSet.getDictionary() != null}, then called
+ /// {@code blockValSet.getDictionaryIdsSV()} on the RAW forward index. Now
gated on
+ /// {@code BlockValSet#isDictionaryEncoded()}, the executor takes the value
path instead. The {@code WHERE}
+ /// predicate is required so the query bypasses
+ /// {@link
org.apache.pinot.core.operator.query.NonScanBasedAggregationOperator}, which
would otherwise serve the
+ /// aggregation directly from the dictionary and hide the regression.
+ @Test(dataProvider = "useBothQueryEngines")
+ public void
testDistinctCountWithFilterOnRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+ long dictResult = scalarLong(
+ String.format("SELECT DISTINCTCOUNT(%s) FROM %s WHERE %s > 100",
+ DICT_DIMENSION, getTableName(), METRIC_COLUMN));
+ long rawResult = scalarLong(
+ String.format("SELECT DISTINCTCOUNT(%s) FROM %s WHERE %s > 100",
+ RAW_DICT_DIMENSION, getTableName(), METRIC_COLUMN));
+ assertEquals(dictResult, UNIQUE_DIMENSION_VALUES, "Dict baseline must
equal the unique value count");
+ assertEquals(rawResult, dictResult,
+ "DISTINCTCOUNT must match between dictionary-only and raw+dictionary
columns");
+ }
+
+ /// {@code DISTINCTCOUNTHLL} previously crashed inside {@link
+ ///
org.apache.pinot.core.query.aggregation.function.DistinctCountHLLAggregationFunction#aggregate}
for the same
+ /// reason as {@code DISTINCTCOUNT}; now gated on {@code
BlockValSet#isDictionaryEncoded()}. The {@code WHERE}
+ /// predicate is required to bypass {@link
+ /// org.apache.pinot.core.operator.query.NonScanBasedAggregationOperator}.
+ @Test(dataProvider = "useBothQueryEngines")
+ public void
testDistinctCountHLLWithFilterOnRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+ long dictResult = scalarLong(
+ String.format("SELECT DISTINCTCOUNTHLL(%s) FROM %s WHERE %s > 100",
+ DICT_DIMENSION, getTableName(), METRIC_COLUMN));
+ long rawResult = scalarLong(
+ String.format("SELECT DISTINCTCOUNTHLL(%s) FROM %s WHERE %s > 100",
+ RAW_DICT_DIMENSION, getTableName(), METRIC_COLUMN));
+ // Sanity floor: catches a regression where both queries silently return 0
(e.g., planner short-circuit).
+ assertTrue(dictResult > 0, "Dict baseline DISTINCTCOUNTHLL must be > 0");
+ assertEquals(rawResult, dictResult,
+ "DISTINCTCOUNTHLL must match between dictionary-only and
raw+dictionary columns");
+ }
+
+ /// {@code DISTINCTCOUNTBITMAP} previously crashed inside {@link
+ ///
org.apache.pinot.core.query.aggregation.function.DistinctCountBitmapAggregationFunction#aggregate};
now gated
+ /// on {@code BlockValSet#isDictionaryEncoded()}. Unlike {@code
DISTINCTCOUNT} / {@code DISTINCTCOUNTHLL}, this
+ /// function is NOT in {@code
AggregationPlanNode#DICTIONARY_BASED_FUNCTIONS}, so the bug surfaced even
without a
+ /// {@code WHERE}.
+ @Test(dataProvider = "useBothQueryEngines")
+ public void testDistinctCountBitmapOnRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+ long dictResult = scalarLong(
+ String.format("SELECT DISTINCTCOUNTBITMAP(%s) FROM %s",
DICT_INT_DIMENSION, getTableName()));
+ long rawResult = scalarLong(
+ String.format("SELECT DISTINCTCOUNTBITMAP(%s) FROM %s",
RAW_DICT_INT_DIMENSION, getTableName()));
+ assertEquals(dictResult, UNIQUE_DIMENSION_VALUES, "Dict baseline must
equal the unique value count");
+ assertEquals(rawResult, dictResult,
+ "DISTINCTCOUNTBITMAP must match between dictionary-only and
raw+dictionary columns");
+ }
+
+ /// {@code SEGMENTPARTITIONEDDISTINCTCOUNT} previously crashed inside {@link
+ ///
org.apache.pinot.core.query.aggregation.function.SegmentPartitionedDistinctCountAggregationFunction#aggregate}
+ /// for the same reason as the other dict-id aggregators; now gated on
{@code BlockValSet#isDictionaryEncoded()}.
+ /// The {@code WHERE} predicate is required to bypass
+ /// {@link
org.apache.pinot.core.operator.query.NonScanBasedAggregationOperator}.
+ @Test(dataProvider = "useBothQueryEngines")
+ public void
testSegmentPartitionedDistinctCountWithFilterOnRawDictColumnReturnsSameResults(
+ boolean useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+ long dictResult = scalarLong(
+ String.format("SELECT SEGMENTPARTITIONEDDISTINCTCOUNT(%s) FROM %s
WHERE %s > 100",
+ DICT_DIMENSION, getTableName(), METRIC_COLUMN));
+ long rawResult = scalarLong(
+ String.format("SELECT SEGMENTPARTITIONEDDISTINCTCOUNT(%s) FROM %s
WHERE %s > 100",
+ RAW_DICT_DIMENSION, getTableName(), METRIC_COLUMN));
+ // Sanity floor: catches a regression where both queries silently return 0
(e.g., planner short-circuit).
+ assertTrue(dictResult > 0, "Dict baseline SEGMENTPARTITIONEDDISTINCTCOUNT
must be > 0");
+ assertEquals(rawResult, dictResult,
+ "SEGMENTPARTITIONEDDISTINCTCOUNT must match between dictionary-only
and raw+dictionary columns");
+ }
+
+ /// {@code MODE} previously crashed inside {@link
+ ///
org.apache.pinot.core.query.aggregation.function.ModeAggregationFunction#aggregate}
for the same reason; now
+ /// gated on {@code BlockValSet#isDictionaryEncoded()}. {@code MODE} is NOT
in
+ /// {@code AggregationPlanNode#DICTIONARY_BASED_FUNCTIONS}, so the bug
surfaced even without a {@code WHERE}.
+ @Test(dataProvider = "useBothQueryEngines")
+ public void testModeOnRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+ JsonNode dictRows = postQuery(
+ String.format("SELECT MODE(%s) FROM %s", DICT_INT_DIMENSION,
getTableName()))
+ .get("resultTable").get("rows");
+ JsonNode rawRows = postQuery(
+ String.format("SELECT MODE(%s) FROM %s", RAW_DICT_INT_DIMENSION,
getTableName()))
+ .get("resultTable").get("rows");
+ assertEquals(rawRows, dictRows,
+ "MODE rows must match between dictionary-only and raw+dictionary
columns");
+ }
+
+ /// Single-column {@code DISTINCT} with a filter exercises
+ /// {@link org.apache.pinot.core.query.distinct.DistinctExecutorFactory}'s
single-column path. Without a filter
+ /// the query routes to {@link
org.apache.pinot.core.operator.query.DictionaryBasedDistinctOperator} which
+ /// iterates the dictionary directly and never hits the bug; with a filter
it goes through
+ /// {@link
org.apache.pinot.core.query.distinct.dictionary.DictionaryBasedSingleColumnDistinctExecutor},
which
+ /// previously called {@code BlockValSet#getDictionaryIdsSV()} and threw on
the RAW forward index — now gated on
+ /// {@code ColumnContext#isDictionaryEncoded()}.
+ @Test(dataProvider = "useBothQueryEngines")
+ public void testDistinctWithFilterOnRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+ JsonNode dictRows = postQuery(
+ String.format("SELECT DISTINCT %s FROM %s WHERE %s > 100 ORDER BY %s",
+ DICT_DIMENSION, getTableName(), METRIC_COLUMN, DICT_DIMENSION))
+ .get("resultTable").get("rows");
+ JsonNode rawRows = postQuery(
+ String.format("SELECT DISTINCT %s FROM %s WHERE %s > 100 ORDER BY %s",
+ RAW_DICT_DIMENSION, getTableName(), METRIC_COLUMN,
RAW_DICT_DIMENSION))
+ .get("resultTable").get("rows");
+ assertEquals(rawRows, dictRows,
+ "DISTINCT (with filter) rows must match between dictionary-only and
raw+dictionary columns");
+ }
+
+ /// Exercise the transform path: a non-identifier expression over a
RAW+dictionary column. Goes through
+ /// {@link org.apache.pinot.core.operator.docvalsets.TransformBlockValSet},
whose {@code isDictionaryEncoded()}
+ /// returns whether the wrapping transform exposes its own dictionary —
{@code UPPER} does not, so the executor
+ /// must take the value path. Regression coverage for raghavyadav01's review
comment on PR #18504.
+ @Test(dataProvider = "useBothQueryEngines")
+ public void testDistinctOnTransformOfRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+ JsonNode dictRows = postQuery(
+ String.format("SELECT DISTINCT UPPER(%s) FROM %s ORDER BY UPPER(%s)",
+ DICT_DIMENSION, getTableName(),
DICT_DIMENSION)).get("resultTable").get("rows");
+ JsonNode rawRows = postQuery(
+ String.format("SELECT DISTINCT UPPER(%s) FROM %s ORDER BY UPPER(%s)",
+ RAW_DICT_DIMENSION, getTableName(),
RAW_DICT_DIMENSION)).get("resultTable").get("rows");
+ assertEquals(rawRows, dictRows,
+ "DISTINCT(transform) rows must match between dictionary-only and
raw+dictionary columns");
+ }
+
@Test(dataProvider = "useBothQueryEngines")
public void
testAggregationWithGroupByOnRawDictColumnReturnsSameResults(boolean
useMultiStageQueryEngine)
throws Exception {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]