arouel opened a new issue, #3542:
URL: https://github.com/apache/parquet-java/issues/3542
### Describe the bug, including details regarding any error messages,
version, and platform.
`LocalInputFile.readFully(ByteBuffer)` and `LocalInputFile.read(ByteBuffer)`
in `parquet-common` are broken for any `ByteBuffer` that either (a) does not
expose an accessible backing array or (b) has a non-zero `position()` when
passed in. In practice this means any call to `ParquetFileReader.readFooter`
against an `InputFile` obtained from new `LocalInputFile(path)` can fail,
Parquet itself passes buffer shapes that trigger the bug.
### Root cause
Both methods end with:
`buf.put(buffer, buf.position() + buf.arrayOffset(), buf.remaining());`
Two independent defects:
1. Wrong argument semantics. `ByteBuffer.put(byte[] src, int offset, int
length)` treats offset as an offset into the source array. The source here is
the freshly-allocated local buffer, whose indices have nothing to do with
`buf.position()` or `buf.arrayOffset()`. It happens to work when both are zero;
any other state either reads from the wrong offset or throws
`IndexOutOfBoundsException`.
2. `arrayOffset()` is not universally defined. Direct buffers, memory-mapped
buffers, and read-only views all throw `UnsupportedOperationException` from
`arrayOffset()`, so the call explodes before the put is even attempted.
`read(ByteBuffer)` has an additional bug: it copies `buf.remaining()` bytes
into the destination regardless of how many bytes `read(byte[])` actually
returned, corrupting the buffer on short reads and advancing `position` past
the EOF boundary.
### Stack trace
```
java.lang.UnsupportedOperationException
at java.base/java.nio.ByteBuffer.arrayOffset(ByteBuffer.java:1558)
at
org.apache.parquet.io.LocalInputFile$1.readFully(LocalInputFile.java:93)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:642)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:578)
at
org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:971)
at
org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:961)
```
### Minimal reproducer
```java
Path path = /* any existing Parquet file */;
try (SeekableInputStream s = new LocalInputFile(path).newStream()) {
s.readFully(ByteBuffer.allocateDirect(8)); // throws
UnsupportedOperationException
}
```
```java
try (SeekableInputStream s = new LocalInputFile(path).newStream()) {
ByteBuffer heap = ByteBuffer.allocate(8);
heap.put(new byte[] {0, 0}); // position=2
s.readFully(heap); // reads from wrong offset
in source array
}
```
### Version
- parquet-common 1.17.0
- Introduced by PARQUET-1822 (commit 7c4cb42a, "Avoid requiring Hadoop
installation for reading/writing", #1111), which added `LocalInputFile`.
### Component(s)
Core
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]