xiangfu0 commented on code in PR #18092:
URL: https://github.com/apache/pinot/pull/18092#discussion_r3036489141
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkForwardIndexWriter.java:
##########
@@ -88,13 +90,40 @@ public abstract class BaseChunkForwardIndexWriter
implements Closeable {
protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType
compressionType, int totalDocs,
int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version,
boolean fixed)
throws IOException {
- Preconditions.checkArgument(version == 2 || version == 3 || (fixed &&
(version == 4 || version == 5)),
+ this(file, compressionType, null, sizeOfEntry, totalDocs, numDocsPerChunk,
chunkSize, sizeOfEntry, version, fixed);
+ }
+
+ /**
+ * Constructor with optional transform codec support.
+ *
+ * @param file Data file to write into
+ * @param compressionType Type of compression
+ * @param transformCodec Optional transform codec to apply before compression
+ * @param valueSizeInBytes Size of each typed value (4 for INT, 8 for LONG);
used by transform
+ * @param totalDocs Total docs to write
+ * @param numDocsPerChunk Number of docs per data chunk
+ * @param chunkSize Size of chunk
+ * @param sizeOfEntry Size of entry (in bytes), max size for variable byte
implementation.
+ * @param version version of File
+ * @param fixed if the data type is fixed width (required for version
validation)
+ * @throws IOException if the file isn't found or can't be mapped
+ */
+ protected BaseChunkForwardIndexWriter(File file, ChunkCompressionType
compressionType,
+ @Nullable TransformCodec transformCodec, int valueSizeInBytes, int
totalDocs,
+ int numDocsPerChunk, long chunkSize, int sizeOfEntry, int version,
boolean fixed)
+ throws IOException {
+ boolean hasTransform = transformCodec != null && transformCodec !=
TransformCodec.NONE;
+ Preconditions.checkArgument(
+ version == 2 || version == 3 || (fixed && (version == 4 || version ==
5 || version == 6)),
"Illegal version: %s for %s bytes values", version, fixed ? "fixed" :
"variable");
+ if (hasTransform) {
+ Preconditions.checkArgument(version == 6, "TransformCodec requires
writer version 6, got: %s", version);
+ }
Preconditions.checkArgument(chunkSize <= Integer.MAX_VALUE, "Chunk size
limited to 2GB");
_chunkSize = (int) chunkSize;
- _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
+ _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType,
transformCodec, valueSizeInBytes);
_headerEntryChunkOffsetSize = version == 2 ? Integer.BYTES : Long.BYTES;
- _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk,
sizeOfEntry, version);
+ _dataOffset = writeHeader(compressionType, transformCodec, totalDocs,
numDocsPerChunk, sizeOfEntry, version);
_chunkBuffer = ByteBuffer.allocateDirect(_chunkSize);
Review Comment:
The writer version was changed from 6 to 7. Full end-to-end roundtrip tests
exist in `ChunkCodecPipelineTest` — tests like `testDeltaPipelineEndToEnd`,
`testDoubleDeltaPipelineEndToEnd`, and `testXorPipelineEndToEnd*` write v7
indexes with transforms and read them back, validating header parsing and data
correctness.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/forward/BaseChunkForwardIndexReader.java:
##########
@@ -84,14 +87,26 @@ protected BaseChunkForwardIndexReader(PinotDataBuffer
dataBuffer, DataType store
headerOffset += Integer.BYTES;
_compressionType =
ChunkCompressionType.valueOf(_dataBuffer.getInt(headerOffset));
- _chunkDecompressor =
ChunkCompressorFactory.getDecompressor(_compressionType);
- _isCompressed =
!_compressionType.equals(ChunkCompressionType.PASS_THROUGH);
-
headerOffset += Integer.BYTES;
+
+ // Version 6+ includes a transform codec field
+ TransformCodec transformCodec = null;
+ if (version >= 6) {
+ transformCodec =
TransformCodec.valueOf(_dataBuffer.getInt(headerOffset));
+ headerOffset += Integer.BYTES;
+ }
+ _transformCodec = transformCodec;
+
+ int valueSizeInBytes = storedType.isFixedWidth() ? storedType.size() : 0;
Review Comment:
Intentionally allowing all fixed-width types (INT, LONG, FLOAT, DOUBLE) —
the XOR transform is designed for FLOAT/DOUBLE columns. The validation now
checks `storedType.isFixedWidth()` which correctly covers INT/LONG/FLOAT/DOUBLE.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]