Jackie-Jiang opened a new pull request, #18470:
URL: https://github.com/apache/pinot/pull/18470

   ## Summary
   
   - Extract a common **`ColumnShape`** interface from `ColumnStatistics`, 
`ColumnMetadata`, and `IndexCreationContext` so column-shape attributes 
(cardinality, element lengths, isAscii, maxRowLengthInBytes, partition info, 
etc.) flow through a single accessor surface. Add **`EmptyColumnShape`** / 
**`EmptyColumnMetadata`** for zero-row segments.
   - Rework **`IndexCreationContext.Builder`** to require `(File indexDir, 
TableConfig tableConfig, ColumnStatistics | ColumnMetadata)` at construction. 
`tableNameWithType` and `continueOnError` are derived from the `TableConfig`; a 
`String` / `ColumnShape` fallback constructor exists for callers without a 
`TableConfig`. Dead `forwardIndexDisabled` plumbing is removed and verbose 
Javadoc is trimmed.
   - **`ColumnMetadataImpl`** gains a `_maxRowLengthInBytes` field derived in 
`Builder.build()` from canonicalized shape fields — correct for SV and 
fixed-width MV, plus uniform-length var-width MV; `UNAVAILABLE` for 
varying-length var-width MV. `extractFieldSpec` / `extractPartitionFunction` / 
`extractPartitions` are exposed as public static helpers.
   - **`SegmentMetadataImpl`** branches on `_totalDocs == 0` and uses 
`EmptyColumnMetadata.fromPropertiesConfiguration` for empty segments, skipping 
V3 index-map loading.
   - **`ForwardIndexHandler.rewriteDictToRawForwardIndex`** leans on 
`columnMetadata.getMaxRowLengthInBytes()` and only scans when the metadata 
returns `UNAVAILABLE`.
   - Bug fix: missing `_totalDocs++` in 
`NoDictColumnStatisticsCollector.collect(Object)` that was causing 
`BufferOverflowException` at segment build.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to