xiangfu0 opened a new pull request, #18092:
URL: https://github.com/apache/pinot/pull/18092
## Summary
- Add optional `transformCodec` field (DELTA, DOUBLE_DELTA) to `FieldConfig`
for RAW forward indexes
- Transform is applied before `compressionCodec`, enabling combinations like
`DELTA + ZSTANDARD` for better compression of monotonic numeric data
(timestamps, counters, sequential IDs)
- V1 scope: single-value INT/LONG columns only, with writer version 6 header
for clean backward compatibility
## Design
The transform is implemented as a composing compressor/decompressor that
wraps the existing compression pipeline:
```
Write: values → DeltaTransform.encode() → LZ4/ZSTD/SNAPPY compress → disk
Read: disk → decompress → DeltaTransform.decode() → original values
```
User-facing config:
```json
{
"fieldConfigList": [{
"name": "eventTimeMillis",
"encodingType": "RAW",
"transformCodec": "DELTA",
"compressionCodec": "ZSTANDARD"
}]
}
```
### Key files
- **New SPI types**: `TransformCodec` enum, `ChunkTransform` interface
- **Transform impls**: `DeltaTransform`, `DoubleDeltaTransform` (pure
in-place, no LZ4 coupling)
- **Composing wrappers**: `TransformCompressor`, `TransformDecompressor`
- **Writer v6 header**: adds `transformCodecType` field after
`compressionType`
- **Validation**: rejects dict-encoded, MV, non-INT/LONG columns at config
validation time
### Backward compatibility
- Old readers see version 6 → fail cleanly (unknown version)
- New readers handle version ≤5 → no transform (implicit NONE)
- Existing `CompressionCodec.DELTA`/`DELTADELTA` untouched
## Test plan
- [x] 23 unit tests for DeltaTransform and DoubleDeltaTransform (edge cases,
overflow, random data)
- [x] 74 existing FixedByteChunkSVForwardIndexTest tests pass (no
regressions)
- [x] Compilation verified across pinot-spi, pinot-segment-spi,
pinot-segment-local, pinot-core
- [ ] Integration test with offline segment generation + query correctness
- [ ] Realtime sealed segment test
- [ ] Benchmark evidence for monotonic data compression improvement
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]