Re: [PR] Add transformCodec support for RAW forward indexes [pinot]

via GitHub Fri, 03 Apr 2026 15:04:57 -0700


Copilot commented on code in PR #18092:
URL: https://github.com/apache/pinot/pull/18092#discussion_r3034444741



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -124,6 +130,18 @@ private void 
validateForwardIndexEnabled(ForwardIndexConfig forwardIndexConfig,
             "Compression codec: %s is not applicable to raw column: %s", 
compressionCodec, column);
       }
     }
+
+    // Validate transformCodec constraints
+    org.apache.pinot.segment.spi.compression.TransformCodec transformCodec =
+        forwardIndexConfig.getTransformCodec();
+    if (transformCodec != null && transformCodec != 
org.apache.pinot.segment.spi.compression.TransformCodec.NONE) {

Review Comment:
   Avoid using fully qualified references to 
org.apache.pinot.segment.spi.compression.TransformCodec inline. Import 
TransformCodec at the top of the file and refer to it directly (and keep 
FieldConfig.TransformCodec referenced via its enclosing type if needed) to 
match the codebase’s import style and improve readability.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkForwardIndexWriter.java:
##########
@@ -88,13 +90,40 @@ public abstract class BaseChunkForwardIndexWriter 
implements Closeable {
   protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType 
compressionType, int totalDocs,
       int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version, 
boolean fixed)
       throws IOException {
-    Preconditions.checkArgument(version == 2 || version == 3 || (fixed && 
(version == 4 || version == 5)),
+    this(file, compressionType, null, sizeOfEntry, totalDocs, numDocsPerChunk, 
chunkSize, sizeOfEntry, version, fixed);
+  }
+
+  /**
+   * Constructor with optional transform codec support.
+   *
+   * @param file Data file to write into
+   * @param compressionType Type of compression
+   * @param transformCodec Optional transform codec to apply before compression
+   * @param valueSizeInBytes Size of each typed value (4 for INT, 8 for LONG); 
used by transform
+   * @param totalDocs Total docs to write
+   * @param numDocsPerChunk Number of docs per data chunk
+   * @param chunkSize Size of chunk
+   * @param sizeOfEntry Size of entry (in bytes), max size for variable byte 
implementation.
+   * @param version version of File
+   * @param fixed if the data type is fixed width (required for version 
validation)
+   * @throws IOException if the file isn't found or can't be mapped
+   */
+  protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType 
compressionType,
+      @Nullable TransformCodec transformCodec, int valueSizeInBytes, int 
totalDocs,
+      int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version, 
boolean fixed)
+      throws IOException {
+    boolean hasTransform = transformCodec != null && transformCodec != 
TransformCodec.NONE;
+    Preconditions.checkArgument(
+        version == 2 || version == 3 || (fixed && (version == 4 || version == 
5 || version == 6)),
         "Illegal version: %s for %s bytes values", version, fixed ? "fixed" : 
"variable");
+    if (hasTransform) {
+      Preconditions.checkArgument(version == 6, "TransformCodec requires 
writer version 6, got: %s", version);
+    }
     Preconditions.checkArgument(chunkSize <= Integer.MAX_VALUE, "Chunk size 
limited to 2GB");
     _chunkSize = (int) chunkSize;
-    _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
+    _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType, 
transformCodec, valueSizeInBytes);

Review Comment:
   When hasTransform is true, the code relies on valueSizeInBytes to decide INT 
vs LONG behavior in the transform layer. Add a strict validation here (e.g., 
require valueSizeInBytes to be exactly 4 or 8, and ideally that the transform 
is only used for INT/LONG forward indexes) to fail fast instead of silently 
producing incorrect on-disk data if this constructor is used incorrectly.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkForwardIndexWriter.java:
##########
@@ -88,13 +90,40 @@ public abstract class BaseChunkForwardIndexWriter 
implements Closeable {
   protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType 
compressionType, int totalDocs,
       int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version, 
boolean fixed)
       throws IOException {
-    Preconditions.checkArgument(version == 2 || version == 3 || (fixed && 
(version == 4 || version == 5)),
+    this(file, compressionType, null, sizeOfEntry, totalDocs, numDocsPerChunk, 
chunkSize, sizeOfEntry, version, fixed);
+  }
+
+  /**
+   * Constructor with optional transform codec support.
+   *
+   * @param file Data file to write into
+   * @param compressionType Type of compression
+   * @param transformCodec Optional transform codec to apply before compression
+   * @param valueSizeInBytes Size of each typed value (4 for INT, 8 for LONG); 
used by transform
+   * @param totalDocs Total docs to write
+   * @param numDocsPerChunk Number of docs per data chunk
+   * @param chunkSize Size of chunk
+   * @param sizeOfEntry Size of entry (in bytes), max size for variable byte 
implementation.
+   * @param version version of File
+   * @param fixed if the data type is fixed width (required for version 
validation)
+   * @throws IOException if the file isn't found or can't be mapped
+   */
+  protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType 
compressionType,
+      @Nullable TransformCodec transformCodec, int valueSizeInBytes, int 
totalDocs,
+      int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version, 
boolean fixed)
+      throws IOException {
+    boolean hasTransform = transformCodec != null && transformCodec != 
TransformCodec.NONE;
+    Preconditions.checkArgument(
+        version == 2 || version == 3 || (fixed && (version == 4 || version == 
5 || version == 6)),
         "Illegal version: %s for %s bytes values", version, fixed ? "fixed" : 
"variable");
+    if (hasTransform) {
+      Preconditions.checkArgument(version == 6, "TransformCodec requires 
writer version 6, got: %s", version);
+    }
     Preconditions.checkArgument(chunkSize <= Integer.MAX_VALUE, "Chunk size 
limited to 2GB");
     _chunkSize = (int) chunkSize;
-    _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
+    _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType, 
transformCodec, valueSizeInBytes);
     _headerEntryChunkOffsetSize = version == 2 ? Integer.BYTES : Long.BYTES;
-    _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk, 
sizeOfEntry, version);
+    _dataOffset = writeHeader(compressionType, transformCodec, totalDocs, 
numDocsPerChunk, sizeOfEntry, version);
     _chunkBuffer = ByteBuffer.allocateDirect(_chunkSize);

Review Comment:
   New writer version 6 support (including the transformCodec header field) 
isn’t covered by existing forward-index roundtrip tests (e.g., 
FixedByteChunkSVForwardIndexTest currently only exercises versions 2/3/4). 
Please add/extend unit tests to write/read version 6 indexes with 
DELTA/DOUBLE_DELTA transforms (at least with PASS_THROUGH and one real 
compressor) to validate header parsing and end-to-end correctness.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/forward/BaseChunkForwardIndexReader.java:
##########
@@ -84,14 +87,26 @@ protected BaseChunkForwardIndexReader(PinotDataBuffer 
dataBuffer, DataType store
       headerOffset += Integer.BYTES;
 
       _compressionType = 
ChunkCompressionType.valueOf(_dataBuffer.getInt(headerOffset));
-      _chunkDecompressor = 
ChunkCompressorFactory.getDecompressor(_compressionType);
-      _isCompressed = 
!_compressionType.equals(ChunkCompressionType.PASS_THROUGH);
-
       headerOffset += Integer.BYTES;
+
+      // Version 6+ includes a transform codec field
+      TransformCodec transformCodec = null;
+      if (version >= 6) {
+        transformCodec = 
TransformCodec.valueOf(_dataBuffer.getInt(headerOffset));
+        headerOffset += Integer.BYTES;
+      }
+      _transformCodec = transformCodec;
+
+      int valueSizeInBytes = storedType.isFixedWidth() ? storedType.size() : 0;

Review Comment:
   When version >= 6 and a non-NONE transformCodec is present, the reader 
currently applies the transform for any fixed-width stored type (including 
FLOAT/DOUBLE). Add a validation here to require storedType to be INT or LONG 
(and valueSizeInBytes 4/8) when transformCodec != NONE, so corrupted/miswritten 
segments fail fast instead of returning silently incorrect values.
   ```suggestion
         int valueSizeInBytes = storedType.isFixedWidth() ? storedType.size() : 
0;
         if (transformCodec != null && transformCodec != TransformCodec.NONE) {
           Preconditions.checkState((storedType == DataType.INT && 
valueSizeInBytes == Integer.BYTES)
                   || (storedType == DataType.LONG && valueSizeInBytes == 
Long.BYTES),
               "Transform codec: %s is only supported for INT/LONG stored types 
with 4/8 byte widths, got storedType: %s,"
                   + " valueSizeInBytes: %s", transformCodec, storedType, 
valueSizeInBytes);
         }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add transformCodec support for RAW forward indexes [pinot]

Reply via email to