arouel opened a new pull request, #3543:
URL: https://github.com/apache/parquet-java/pull/3543
### Rationale for this change
`LocalInputFile`'s `read(ByteBuffer)` and `readFully(ByteBuffer)` are broken
for two independent reasons:
1. They pass `buf.position() + buf.arrayOffset()` as the source offset to
`ByteBuffer.put(byte[] src, int offset, int length)`. The source is a
freshly-allocated local byte[] whose indices have no relationship to
`buf.position()` or `buf.arrayOffset()`, this reads from the wrong offset of
the source array whenever the destination buffer's position is non-zero.
2. They call `buf.arrayOffset()`, which throws
`UnsupportedOperationException` on direct buffers, memory-mapped buffers, and
read-only views. `ParquetFileReader.readFooter` passes exactly such buffer
shapes, so any consumer wrapping a Path with new `LocalInputFile(path)` can hit
this bug.
`read(ByteBuffer)` has an additional defect: it advances the destination by
`buf.remaining()` regardless of how many bytes `read(byte[])` actually
returned, corrupting the buffer on short reads and at EOF.
### What changes are included in this PR?
- parquet-common/src/main/java/org/apache/parquet/io/LocalInputFile.java:
- `readFully(ByteBuffer)` now uses `buf.put(buffer)`, a straight
array-to-buffer copy that works for every ByteBuffer shape and never calls
arrayOffset().
- `read(ByteBuffer)` copies only the bytes actually read (`buf.put(buffer,
0, read)` when `read > 0`) and returns the underlying read result, so `-1` at
EOF propagates correctly and the destination's position is left untouched on
EOF.
### Are these changes tested?
Yes. `TestLocalInputOutput` gains eight regression tests that fail against
the previous implementation and pass with the fix.
### Are there any user-facing changes?
No API signatures changed.
Closes #3542
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]