LuciferYang opened a new pull request, #1119:
URL: https://github.com/apache/arrow-java/pull/1119
## What
Two fixes in the compression codec:
1. **`AbstractCompressionCodec.compress()`: uncompressed-length prefix could
be written with a wrong value**
The old code read `uncompressedBuffer.writerIndex()` *after*
`doCompress()` to populate the 8-byte prefix. But `uncompressedBuffer` is a
shared reference to the vector's internal buffer. If `writerIndex` changed
between `doCompress()` and the subsequent read -- e.g. due to
`VectorSchemaRoot` reuse (`clear`/`allocateNew`) -- the prefix would get a
wrong value such as 0.
Fix: capture `writerIndex()` once at the top of `compress()` and reuse it
for the empty-buffer check, size comparison, and prefix write.
2. **`ZstdCompressionCodec.doCompress()`: `dstCapacity` overstated by 8
bytes**
`Zstd.compressUnsafe(dst, dstSize, ...)` expects `dstSize` to be the
available space from `dst`. The code offsets `dst` by 8 bytes past the prefix,
but passed `8 + maxSize` instead of `maxSize`. In practice `compressBound()`
headroom prevented an actual out-of-bounds write, but the parameter was
semantically wrong.
Fix: pass `maxSize` instead of `dstSize`.
## Tests
- `testMultiBatchZstdStreamWithWideSchemaAndAllNulls` -- 100 fields x 10
batches x 500 rows, `VectorSchemaRoot` reuse with all-null timestamp columns in
every 3rd batch, full streaming round-trip with per-cell verification.
- `testAllNullFixedWidthVectorZstdRoundTrip` -- 3469-row all-null
`TimestampMilliVector`, buffer-level compress/decompress, asserts decompressed
`writerIndex` matches the original.
## Closes
Closes #1116
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]