xiangfu0 commented on code in PR #18092:
URL: https://github.com/apache/pinot/pull/18092#discussion_r3036489412


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkForwardIndexWriter.java:
##########
@@ -88,15 +91,45 @@ public abstract class BaseChunkForwardIndexWriter 
implements Closeable {
   protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType 
compressionType, int totalDocs,
       int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version, 
boolean fixed)
       throws IOException {
-    Preconditions.checkArgument(version == 2 || version == 3 || (fixed && 
(version == 4 || version == 5)),
+    this(file, compressionType, null, sizeOfEntry, totalDocs, numDocsPerChunk, 
chunkSize, sizeOfEntry, version, fixed);
+  }
+
+  /**
+   * Constructor with optional codec pipeline support.
+   *
+   * @param file Data file to write into
+   * @param compressionType Type of compression (used when pipeline is null)
+   * @param codecPipeline Optional codec pipeline; when non-null, requires 
version 7
+   * @param valueSizeInBytes Size of each typed value (4 for INT, 8 for LONG); 
used by pipeline transforms
+   * @param totalDocs Total docs to write
+   * @param numDocsPerChunk Number of docs per data chunk
+   * @param chunkSize Size of chunk
+   * @param sizeOfEntry Size of entry (in bytes)
+   * @param version version of File
+   * @param fixed if the data type is fixed width
+   */
+  protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType 
compressionType,
+      @Nullable ChunkCodecPipeline codecPipeline, int valueSizeInBytes, int 
totalDocs,
+      int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version, 
boolean fixed)
+      throws IOException {
+    boolean hasPipeline = codecPipeline != null;
+    Preconditions.checkArgument(
+        version == 2 || version == 3 || (fixed && (version == 4 || version == 
5 || version == 7)),
         "Illegal version: %s for %s bytes values", version, fixed ? "fixed" : 
"variable");
+    if (hasPipeline) {
+      Preconditions.checkArgument(version == 7, "codecPipeline requires writer 
version 7, got: %s", version);
+    }

Review Comment:
   Already enforced — the constructor has bidirectional checks:
   ```java
   if (hasPipeline) {
     Preconditions.checkArgument(version == 7, "codecPipeline requires writer 
version 7");
   }
   if (version == 7) {
     Preconditions.checkArgument(hasPipeline, "Writer version 7 requires a 
non-null codecPipeline");
   }
   ```
   So version 7 without a pipeline is rejected at construction time.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java:
##########
@@ -109,7 +110,26 @@ private void 
validateForwardIndexEnabled(ForwardIndexConfig forwardIndexConfig,
       FieldSpec fieldSpec) {
     String column = fieldSpec.getName();
     CompressionCodec compressionCodec = 
forwardIndexConfig.getCompressionCodec();
+    ChunkCodecPipeline codecPipeline = forwardIndexConfig.getCodecPipeline();
     DictionaryIndexConfig dictionaryConfig = 
indexConfigs.getConfig(StandardIndexes.dictionary());
+
+    // Validate codec pipeline
+    if (codecPipeline != null) {
+      Preconditions.checkState(!dictionaryConfig.isEnabled(),
+          "codecPipeline is not applicable to dictionary encoded column: %s", 
column);
+      Preconditions.checkState(fieldSpec.isSingleValueField(),
+          "codecPipeline is only supported on single-value columns, not 
applicable to column: %s", column);
+
+      // Validate transform codecs require fixed-width INT/LONG
+      if (codecPipeline.hasTransforms()) {
+        FieldSpec.DataType storedType = 
fieldSpec.getDataType().getStoredType();
+        Preconditions.checkState(
+            storedType == FieldSpec.DataType.INT || storedType == 
FieldSpec.DataType.LONG,
+            "codecPipeline with transforms requires INT or LONG column, got %s 
for column: %s",
+            storedType, column);
+      }
+    }

Review Comment:
   By design, compression-only pipelines (e.g. `["LZ4"]`) work on all column 
types including STRING/BYTES — they go through the same code path as the legacy 
`compressionCodec`. Only pipelines with transforms require fixed-width types. 
Since `compressionCodec` is always auto-converted to an equivalent pipeline 
internally, restricting compression-only pipelines to fixed-width would break 
backward compatibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to