Copilot commented on code in PR #17269:
URL: https://github.com/apache/pinot/pull/17269#discussion_r3041632094
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -61,6 +64,28 @@ public void init(SegmentGeneratorConfig segmentCreationSpec,
SegmentIndexCreatio
segmentCreationSpec.getIndexConfigsByColName().size());
initializeCommon(segmentCreationSpec, segmentIndexCreationInfo,
indexCreationInfoMap,
schema, outDir, colIndexes, immutableToMutableIdMap, instanceType);
+
+ // Although NullValueVector is implemented as an index, it needs to be
treated in a different way than other indexes
+ // Process all field specs (not just indexed columns) for null value
vectors
+ for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
+ String columnName = fieldSpec.getName();
+ // Create null value vector for non-indexed nullable columns
+ if (!_colIndexes.containsKey(columnName)) {
+ if (isNullable(fieldSpec, schema, segmentCreationSpec)) {
+ // Initialize Null value vector map
+ LOGGER.info("Column: {} is nullable", columnName);
+ _colIndexes.put(columnName,
+ new ColumnIndexCreators(columnName, fieldSpec, null, List.of(),
+ new NullValueVectorCreator(outDir, columnName)));
+ } else {
+ LOGGER.debug("Column: {} is not nullable", columnName);
+ }
+ }
Review Comment:
Adding non-indexed nullable columns into _colIndexes means indexRow() will
now iterate them. However, indexRow/indexColumnValue currently throws when
row.getValue(...) returns null, before the null value vector can be set. This
can introduce segment generation failures for nullable columns that are
absent/null in the input. Either avoid adding such columns to _colIndexes for
row-wise indexing, or handle nulls by setting the null vector and substituting
the FieldSpec defaultNullValue (similar to the ColumnReader code path).
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -61,6 +64,28 @@ public void init(SegmentGeneratorConfig segmentCreationSpec,
SegmentIndexCreatio
segmentCreationSpec.getIndexConfigsByColName().size());
initializeCommon(segmentCreationSpec, segmentIndexCreationInfo,
indexCreationInfoMap,
schema, outDir, colIndexes, immutableToMutableIdMap, instanceType);
+
+ // Although NullValueVector is implemented as an index, it needs to be
treated in a different way than other indexes
+ // Process all field specs (not just indexed columns) for null value
vectors
+ for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
+ String columnName = fieldSpec.getName();
+ // Create null value vector for non-indexed nullable columns
+ if (!_colIndexes.containsKey(columnName)) {
+ if (isNullable(fieldSpec, schema, segmentCreationSpec)) {
+ // Initialize Null value vector map
+ LOGGER.info("Column: {} is nullable", columnName);
+ _colIndexes.put(columnName,
+ new ColumnIndexCreators(columnName, fieldSpec, null, List.of(),
+ new NullValueVectorCreator(outDir, columnName)));
+ } else {
Review Comment:
The new INFO log ('Column: {} is nullable') will emit once per nullable,
non-indexed column for every segment build. This is likely too noisy at INFO in
production; consider lowering to DEBUG or logging once with a summary/count
instead.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexCreatorFactory.java:
##########
@@ -45,15 +45,16 @@ public class ForwardIndexCreatorFactory {
private ForwardIndexCreatorFactory() {
}
- public static ForwardIndexCreator createIndexCreator(IndexCreationContext
context, ForwardIndexConfig indexConfig)
+ public static ForwardIndexCreator createIndexCreator(IndexCreationContext
context)
throws Exception {
+ ForwardIndexConfig indexConfig = context.getForwardIndexConfig();
File indexDir = context.getIndexDir();
FieldSpec fieldSpec = context.getFieldSpec();
String columnName = fieldSpec.getName();
int numTotalDocs = context.getTotalDocs();
-
- if (context.hasDictionary()) {
- // Dictionary enabled columns
+ if (context.hasDictionary()
+ && context.getForwardIndexEncoding() ==
IndexCreationContext.ForwardIndexEncoding.DICTIONARY) {
Review Comment:
createIndexCreator() now pulls ForwardIndexConfig from
context.getForwardIndexConfig(), but that field is nullable in the SPI and
there’s no null-check here. If a caller forgets to set it (especially since
ForwardIndexType now ignores the indexConfig parameter), this will NPE deep in
the factory. Consider validating non-null upfront (with a clear message) or
defaulting to ForwardIndexConfig.getDefault().
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java:
##########
@@ -1506,6 +1506,7 @@ private static void
validateIndexingConfigAndFieldConfigList(TableConfig tableCo
indexType.validate(indexConfigs, fieldSpec, tableConfig);
}
}
+ validateExplicitDictionaryForRawForwardIndex(tableConfig, schema,
indexConfigsMap);
Review Comment:
validateExplicitDictionaryForRawForwardIndex() is invoked during validation
but is currently a no-op. This adds maintenance overhead and makes it unclear
whether an invariant is supposed to be enforced. Either implement the intended
validation (e.g., reject RAW forward columns with dictionary-required indexes
unless dictionary is explicitly enabled) or remove the method/call until there
is real logic.
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessorTest.java:
##########
@@ -345,23 +346,39 @@ public void
testEnableTextIndexOnNewColumnRaw(SegmentVersion segmentVersion)
_fieldConfigMap.put(NEWLY_ADDED_STRING_COL_RAW,
new FieldConfig(NEWLY_ADDED_STRING_COL_RAW,
FieldConfig.EncodingType.RAW, List.of(FieldConfig.IndexType.TEXT),
null, null));
+ // Text index creation for newly added raw columns may not fully succeed;
verify pre-processing doesn't crash.
+ try {
+ checkTextIndexCreation(NEWLY_ADDED_STRING_COL_RAW, 1, 1,
_newColumnsSchemaWithText, true, true, true, 4);
+ } catch (AssertionError | RuntimeException e) {
+ // Text index on newly added raw column may not be created correctly;
this is a known limitation.
+ }
_fieldConfigMap.put(NEWLY_ADDED_STRING_MV_COL_RAW,
new FieldConfig(NEWLY_ADDED_STRING_MV_COL_RAW,
FieldConfig.EncodingType.RAW,
List.of(FieldConfig.IndexType.TEXT), null, null));
- checkTextIndexCreation(NEWLY_ADDED_STRING_COL_RAW, 1, 1,
_newColumnsSchemaWithText, true, true, true, 4);
- checkTextIndexCreation(NEWLY_ADDED_STRING_MV_COL_RAW, 1, 1,
_newColumnsSchemaWithText, true, true, false, 4, false,
- 1);
+ try {
+ checkTextIndexCreation(NEWLY_ADDED_STRING_MV_COL_RAW, 1, 1,
_newColumnsSchemaWithText, true, true, false, 4,
+ false, 1);
+ } catch (AssertionError | RuntimeException e) {
+ // Text index on newly added raw column may not be created correctly;
this is a known limitation.
+ }
Review Comment:
These try/catch blocks swallow AssertionError/RuntimeException, so the test
can pass even if text index creation regresses for newly added RAW columns.
Prefer asserting a specific expected behavior (e.g., index absent but
preprocessing completes), or explicitly mark/skip the scenario with a clear
reason so failures aren’t silently ignored.
##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessorTest.java:
##########
@@ -1812,8 +1839,12 @@ public void
testStarTreeCreationWithInvalidFunctionColumnPair()
new IndexLoadingConfig(tableConfig, schema))) {
// Should need processing due to new star-tree config
assertTrue(processor.needProcess());
- // Process should complete without throwing exception, but star-tree
should not be created
- processor.process(SEGMENT_OPERATIONS_THROTTLER);
+ // Process may fail with invalid function column pair; swallow to verify
no star-tree is created
+ try {
+ processor.process(SEGMENT_OPERATIONS_THROTTLER);
+ } catch (IllegalStateException e) {
+ // expected for invalid function pair; proceed to validation
+ }
Review Comment:
Catching and ignoring IllegalStateException here makes the test unable to
detect unexpected failures (it will pass for any IllegalStateException). If the
goal is to verify 'no star-tree created/updated' for invalid configs, assert on
the exception message/type more narrowly (or use expectedExceptions) and still
fail on other unexpected errors.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/BaseSegmentCreator.java:
##########
@@ -572,6 +604,7 @@ public static void
addColumnMetadataInfo(PropertiesConfiguration properties, Str
properties.setProperty(getKeyFor(column, COLUMN_TYPE),
String.valueOf(fieldType));
properties.setProperty(getKeyFor(column, IS_SORTED),
String.valueOf(columnIndexCreationInfo.isSorted()));
properties.setProperty(getKeyFor(column, HAS_DICTIONARY),
String.valueOf(hasDictionary));
Review Comment:
When a dictionary is created for secondary indexes while the forward index
remains RAW, this code still writes HAS_DICTIONARY=true and never sets
HAS_SHARED_DICTIONARY. That contradicts the backward-compat intent documented
in V1Constants/ForwardIndexHandler and can cause older servers (that only look
at HAS_DICTIONARY) to try to load a dict-encoded forward index file and fail.
Consider writing HAS_DICTIONARY=false + HAS_SHARED_DICTIONARY=true whenever
forwardIndexEncoding is RAW but a dictionary file is present, and keep
FORWARD_INDEX_ENCODING as the source of truth for new nodes.
```suggestion
// Older readers rely on HAS_DICTIONARY to infer forward-index encoding.
When a dictionary exists only to support
// secondary indexes and the forward index remains RAW, record it as a
shared dictionary instead.
boolean hasSharedDictionary =
hasDictionary && forwardIndexEncoding ==
IndexCreationContext.ForwardIndexEncoding.RAW;
boolean hasDictionaryForForwardIndex = hasDictionary &&
!hasSharedDictionary;
properties.setProperty(getKeyFor(column, HAS_DICTIONARY),
String.valueOf(hasDictionaryForForwardIndex));
properties.setProperty(getKeyFor(column, HAS_SHARED_DICTIONARY),
String.valueOf(hasSharedDictionary));
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]