Copilot commented on code in PR #18092:
URL: https://github.com/apache/pinot/pull/18092#discussion_r3034444741
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -124,6 +130,18 @@ private void
validateForwardIndexEnabled(ForwardIndexConfig forwardIndexConfig,
"Compression codec: %s is not applicable to raw column: %s",
compressionCodec, column);
}
}
+
+ // Validate transformCodec constraints
+ org.apache.pinot.segment.spi.compression.TransformCodec transformCodec =
+ forwardIndexConfig.getTransformCodec();
+ if (transformCodec != null && transformCodec !=
org.apache.pinot.segment.spi.compression.TransformCodec.NONE) {
Review Comment:
Avoid using fully qualified references to
org.apache.pinot.segment.spi.compression.TransformCodec inline. Import
TransformCodec at the top of the file and refer to it directly (and keep
FieldConfig.TransformCodec referenced via its enclosing type if needed) to
match the codebase’s import style and improve readability.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkForwardIndexWriter.java:
##########
@@ -88,13 +90,40 @@ public abstract class BaseChunkForwardIndexWriter
implements Closeable {
protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType
compressionType, int totalDocs,
int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version,
boolean fixed)
throws IOException {
- Preconditions.checkArgument(version == 2 || version == 3 || (fixed &&
(version == 4 || version == 5)),
+ this(file, compressionType, null, sizeOfEntry, totalDocs, numDocsPerChunk,
chunkSize, sizeOfEntry, version, fixed);
+ }
+
+ /**
+ * Constructor with optional transform codec support.
+ *
+ * @param file Data file to write into
+ * @param compressionType Type of compression
+ * @param transformCodec Optional transform codec to apply before compression
+ * @param valueSizeInBytes Size of each typed value (4 for INT, 8 for LONG);
used by transform
+ * @param totalDocs Total docs to write
+ * @param numDocsPerChunk Number of docs per data chunk
+ * @param chunkSize Size of chunk
+ * @param sizeOfEntry Size of entry (in bytes), max size for variable byte
implementation.
+ * @param version version of File
+ * @param fixed if the data type is fixed width (required for version
validation)
+ * @throws IOException if the file isn't found or can't be mapped
+ */
+ protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType
compressionType,
+ @Nullable TransformCodec transformCodec, int valueSizeInBytes, int
totalDocs,
+ int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version,
boolean fixed)
+ throws IOException {
+ boolean hasTransform = transformCodec != null && transformCodec !=
TransformCodec.NONE;
+ Preconditions.checkArgument(
+ version == 2 || version == 3 || (fixed && (version == 4 || version ==
5 || version == 6)),
"Illegal version: %s for %s bytes values", version, fixed ? "fixed" :
"variable");
+ if (hasTransform) {
+ Preconditions.checkArgument(version == 6, "TransformCodec requires
writer version 6, got: %s", version);
+ }
Preconditions.checkArgument(chunkSize <= Integer.MAX_VALUE, "Chunk size
limited to 2GB");
_chunkSize = (int) chunkSize;
- _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
+ _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType,
transformCodec, valueSizeInBytes);
Review Comment:
When hasTransform is true, the code relies on valueSizeInBytes to decide INT
vs LONG behavior in the transform layer. Add a strict validation here (e.g.,
require valueSizeInBytes to be exactly 4 or 8, and ideally that the transform
is only used for INT/LONG forward indexes) to fail fast instead of silently
producing incorrect on-disk data if this constructor is used incorrectly.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkForwardIndexWriter.java:
##########
@@ -88,13 +90,40 @@ public abstract class BaseChunkForwardIndexWriter
implements Closeable {
protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType
compressionType, int totalDocs,
int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version,
boolean fixed)
throws IOException {
- Preconditions.checkArgument(version == 2 || version == 3 || (fixed &&
(version == 4 || version == 5)),
+ this(file, compressionType, null, sizeOfEntry, totalDocs, numDocsPerChunk,
chunkSize, sizeOfEntry, version, fixed);
+ }
+
+ /**
+ * Constructor with optional transform codec support.
+ *
+ * @param file Data file to write into
+ * @param compressionType Type of compression
+ * @param transformCodec Optional transform codec to apply before compression
+ * @param valueSizeInBytes Size of each typed value (4 for INT, 8 for LONG);
used by transform
+ * @param totalDocs Total docs to write
+ * @param numDocsPerChunk Number of docs per data chunk
+ * @param chunkSize Size of chunk
+ * @param sizeOfEntry Size of entry (in bytes), max size for variable byte
implementation.
+ * @param version version of File
+ * @param fixed if the data type is fixed width (required for version
validation)
+ * @throws IOException if the file isn't found or can't be mapped
+ */
+ protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType
compressionType,
+ @Nullable TransformCodec transformCodec, int valueSizeInBytes, int
totalDocs,
+ int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version,
boolean fixed)
+ throws IOException {
+ boolean hasTransform = transformCodec != null && transformCodec !=
TransformCodec.NONE;
+ Preconditions.checkArgument(
+ version == 2 || version == 3 || (fixed && (version == 4 || version ==
5 || version == 6)),
"Illegal version: %s for %s bytes values", version, fixed ? "fixed" :
"variable");
+ if (hasTransform) {
+ Preconditions.checkArgument(version == 6, "TransformCodec requires
writer version 6, got: %s", version);
+ }
Preconditions.checkArgument(chunkSize <= Integer.MAX_VALUE, "Chunk size
limited to 2GB");
_chunkSize = (int) chunkSize;
- _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
+ _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType,
transformCodec, valueSizeInBytes);
_headerEntryChunkOffsetSize = version == 2 ? Integer.BYTES : Long.BYTES;
- _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk,
sizeOfEntry, version);
+ _dataOffset = writeHeader(compressionType, transformCodec, totalDocs,
numDocsPerChunk, sizeOfEntry, version);
_chunkBuffer = ByteBuffer.allocateDirect(_chunkSize);
Review Comment:
New writer version 6 support (including the transformCodec header field)
isn’t covered by existing forward-index roundtrip tests (e.g.,
FixedByteChunkSVForwardIndexTest currently only exercises versions 2/3/4).
Please add/extend unit tests to write/read version 6 indexes with
DELTA/DOUBLE_DELTA transforms (at least with PASS_THROUGH and one real
compressor) to validate header parsing and end-to-end correctness.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/forward/BaseChunkForwardIndexReader.java:
##########
@@ -84,14 +87,26 @@ protected BaseChunkForwardIndexReader(PinotDataBuffer
dataBuffer, DataType store
headerOffset += Integer.BYTES;
_compressionType =
ChunkCompressionType.valueOf(_dataBuffer.getInt(headerOffset));
- _chunkDecompressor =
ChunkCompressorFactory.getDecompressor(_compressionType);
- _isCompressed =
!_compressionType.equals(ChunkCompressionType.PASS_THROUGH);
-
headerOffset += Integer.BYTES;
+
+ // Version 6+ includes a transform codec field
+ TransformCodec transformCodec = null;
+ if (version >= 6) {
+ transformCodec =
TransformCodec.valueOf(_dataBuffer.getInt(headerOffset));
+ headerOffset += Integer.BYTES;
+ }
+ _transformCodec = transformCodec;
+
+ int valueSizeInBytes = storedType.isFixedWidth() ? storedType.size() : 0;
Review Comment:
When version >= 6 and a non-NONE transformCodec is present, the reader
currently applies the transform for any fixed-width stored type (including
FLOAT/DOUBLE). Add a validation here to require storedType to be INT or LONG
(and valueSizeInBytes 4/8) when transformCodec != NONE, so corrupted/miswritten
segments fail fast instead of returning silently incorrect values.
```suggestion
int valueSizeInBytes = storedType.isFixedWidth() ? storedType.size() :
0;
if (transformCodec != null && transformCodec != TransformCodec.NONE) {
Preconditions.checkState((storedType == DataType.INT &&
valueSizeInBytes == Integer.BYTES)
|| (storedType == DataType.LONG && valueSizeInBytes ==
Long.BYTES),
"Transform codec: %s is only supported for INT/LONG stored types
with 4/8 byte widths, got storedType: %s,"
+ " valueSizeInBytes: %s", transformCodec, storedType,
valueSizeInBytes);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]