LuciferYang opened a new pull request, #1119:
URL: https://github.com/apache/arrow-java/pull/1119

   ## What
   
   Two fixes in the compression codec:
   
   1. **`AbstractCompressionCodec.compress()`: uncompressed-length prefix could 
be written with a wrong value**
   
      The old code read `uncompressedBuffer.writerIndex()` *after* 
`doCompress()` to populate the 8-byte prefix. But `uncompressedBuffer` is a 
shared reference to the vector's internal buffer. If `writerIndex` changed 
between `doCompress()` and the subsequent read -- e.g. due to 
`VectorSchemaRoot` reuse (`clear`/`allocateNew`) -- the prefix would get a 
wrong value such as 0.
   
      Fix: capture `writerIndex()` once at the top of `compress()` and reuse it 
for the empty-buffer check, size comparison, and prefix write.
   
   2. **`ZstdCompressionCodec.doCompress()`: `dstCapacity` overstated by 8 
bytes**
   
      `Zstd.compressUnsafe(dst, dstSize, ...)` expects `dstSize` to be the 
available space from `dst`. The code offsets `dst` by 8 bytes past the prefix, 
but passed `8 + maxSize` instead of `maxSize`. In practice `compressBound()` 
headroom prevented an actual out-of-bounds write, but the parameter was 
semantically wrong.
   
      Fix: pass `maxSize` instead of `dstSize`.
   
   ## Tests
   
   - `testMultiBatchZstdStreamWithWideSchemaAndAllNulls` -- 100 fields x 10 
batches x 500 rows, `VectorSchemaRoot` reuse with all-null timestamp columns in 
every 3rd batch, full streaming round-trip with per-cell verification.
   - `testAllNullFixedWidthVectorZstdRoundTrip` -- 3469-row all-null 
`TimestampMilliVector`, buffer-level compress/decompress, asserts decompressed 
`writerIndex` matches the original.
   
   ## Closes
   
   Closes #1116
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to