Re: [PR] Introduce ColumnShape; align IndexCreationContext APIs [pinot]

via GitHub Mon, 11 May 2026 20:55:34 -0700


Copilot commented on code in PR #18470:
URL: https://github.com/apache/pinot/pull/18470#discussion_r3223582496



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/fst/IFSTIndexType.java:
##########
@@ -96,13 +97,19 @@ protected ColumnConfigDeserializer<FstIndexConfig> 
createDeserializerForLegacyCo
   @Override
   public FSTIndexCreator createIndexCreator(IndexCreationContext context, 
FstIndexConfig indexConfig)
       throws IOException {
-    Preconditions.checkState(context.getFieldSpec().isSingleValueField(),
+    FieldSpec fieldSpec = context.getFieldSpec();
+    Preconditions.checkState(fieldSpec.isSingleValueField(),
         "IFST index is currently only supported on single-value columns");
-    
Preconditions.checkState(context.getFieldSpec().getDataType().getStoredType() 
== FieldSpec.DataType.STRING,
+    Preconditions.checkState(fieldSpec.getDataType().getStoredType() == 
FieldSpec.DataType.STRING,
         "IFST index is currently only supported on STRING type columns");
     Preconditions.checkState(context.hasDictionary(),
         "IFST index is currently only supported on dictionary-encoded 
columns");
-    return new LuceneIFSTIndexCreator(context);
+    ColumnStatistics columnStatistics = context.getColumnStatistics();
+    Preconditions.checkState(columnStatistics != null, "ColumnStatistics must 
be provided");
+    Object uniqueValuesSet = columnStatistics.getUniqueValuesSet();
+    Preconditions.checkState(uniqueValuesSet instanceof String[], "Unique 
values set must be of type String[]");
+    return new LuceneIFSTIndexCreator(context.getIndexDir(), 
fieldSpec.getName(), context.getTableNameWithType(),
+        context.isContinueOnError(), (String[]) uniqueValuesSet);

Review Comment:
   Requiring `ColumnStatistics` here makes IFST index creation impossible from 
index-loading handlers that only have `ColumnMetadata` (which is how the 
codebase commonly rebuilds indexes). Consider allowing `columnStatistics == 
null` and falling back to an alternative source of sorted unique `String[]` 
(e.g., computed from dictionary) or reintroducing an API path that does not 
depend on `ColumnStatistics` being present.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/fst/FstIndexType.java:
##########
@@ -100,13 +101,17 @@ protected ColumnConfigDeserializer<FstIndexConfig> 
createDeserializerForLegacyCo
   @Override
   public FSTIndexCreator createIndexCreator(IndexCreationContext context, 
FstIndexConfig indexConfig)
       throws IOException {
-    Preconditions.checkState(context.getFieldSpec().isSingleValueField(),
+    FieldSpec fieldSpec = context.getFieldSpec();
+    Preconditions.checkState(fieldSpec.isSingleValueField(),
         "FST index is currently only supported on single-value columns");
-    
Preconditions.checkState(context.getFieldSpec().getDataType().getStoredType() 
== FieldSpec.DataType.STRING,
+    Preconditions.checkState(fieldSpec.getDataType().getStoredType() == 
FieldSpec.DataType.STRING,
         "FST index is currently only supported on STRING type columns");
     Preconditions.checkState(context.hasDictionary(),
         "FST index is currently only supported on dictionary-encoded columns");
-    return new LuceneFSTIndexCreator(context);
+    ColumnStatistics columnStatistics = context.getColumnStatistics();
+    String[] sortedEntries = columnStatistics != null ? (String[]) 
columnStatistics.getUniqueValuesSet() : null;

Review Comment:
   This introduces an unchecked cast of `columnStatistics.getUniqueValuesSet()` 
to `String[]`. If the unique-values representation differs (or is unset) this 
can throw `ClassCastException`. Align with `IFSTIndexType` by validating 
`instanceof String[]` before casting (and decide on a deterministic fallback 
when it isn’t).
   



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java:
##########
@@ -357,38 +330,42 @@ private Map<String, String> 
createForwardIndexForMVColumn()
       }
 
       // Construct the forward index values buffer from the inverted index 
using the length buffer for index tracking
+      DataType storedType = _columnMetadata.getStoredType();
+      boolean isFixedWidth = storedType.isFixedWidth();
+      int maxRowLengthInBytes = isFixedWidth ? maxNumberOfMultiValues * 
storedType.size() : 0;
       for (int dictId = 0; dictId < _cardinality; dictId++) {
         ImmutableRoaringBitmap docIdsBitmap = 
invertedIndexReader.getDocIds(dictId);
-        int finalDictId = dictId;
-        docIdsBitmap.stream().forEach(docId -> {
+        PeekableIntIterator intIterator = docIdsBitmap.getIntIterator();
+        while (intIterator.hasNext()) {
+          int docId = intIterator.next();
           int index = getInt(_forwardIndexLengthBuffer, docId);
-          putInt(_forwardIndexValueBuffer, index, finalDictId);
+          putInt(_forwardIndexValueBuffer, index, dictId);
           putInt(_forwardIndexLengthBuffer, docId, index + 1);
           if (!isFixedWidth) {
-            trackMaxRowLengthInBytes(dictionary, maxRowLengthInBytes, docId, 
finalDictId);
+            int currentRowLength = getInt(_forwardIndexMaxSizeBuffer, docId);
+            int newRowLength = currentRowLength + 
dictionary.getValueSize(dictId);
+            putInt(_forwardIndexMaxSizeBuffer, docId, newRowLength);
+            maxRowLengthInBytes = Math.max(maxRowLengthInBytes, newRowLength);
           }
-        });
+        }
       }
 
-      IndexCreationContext context = IndexCreationContext.builder()
-          .withIndexDir(_segmentMetadata.getIndexDir())
-          .withColumnMetadata(_columnMetadata)
-          .withForwardIndexDisabled(false)
-          .withDictionary(_dictionaryPresent)
+      // When the duplicate-detection branch above fires, `_nextValueId` and 
`maxNumberOfMultiValues[0]` are the new

Review Comment:
   The comment still references `maxNumberOfMultiValues[0]`, but 
`maxNumberOfMultiValues` is no longer an array in the updated implementation. 
Update the comment to match the current variable names to avoid confusion.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Introduce ColumnShape; align IndexCreationContext APIs [pinot]

Reply via email to