laskoviymishka commented on code in PR #16167:
URL: https://github.com/apache/iceberg/pull/16167#discussion_r3249629785
##########
core/src/main/java/org/apache/iceberg/io/MultiBufferInputStream.java:
##########
@@ -263,7 +263,7 @@ public int read(byte[] bytes) {
@Override
public int read() throws IOException {
if (current == null) {
- throw new EOFException();
+ return -1;
Review Comment:
The contract fix is right, but flagging this for visibility: I found four
existing call sites that consume the result of `read()` without a `-1` guard
and currently depend on the throw to fail loudly. After this change they fail
silently — or hang, in one case:
- `ValuesAsBytesReader.getByte()` (parquet) — casts to byte, `(byte) -1 =
0xFF` → 8 spurious true bits in boolean decoding
- `BaseVectorizedParquetValuesReader.readUnsignedVarInt()` (arrow) — `-1 &
0x80 != 0` always true → **infinite loop**
- `BaseVectorizedParquetValuesReader.readIntLittleEndian()` and
`readIntLittleEndianPaddedOnBitWidth()` (arrow) — return garbage int instead of
throwing
These callers shouldn't reach EOF on well-formed input (Parquet decoders
know exact byte counts), so this is defense-in-depth, not a happy-path
regression. But the failure mode changes from loud to silent/hanging.
Leaving the call to you — see options A/B/C in the top-level body. If you go
with A (fix at callers), happy to look at the follow-up changes. Whatever you
decide, could you note it in the PR description so it's visible to other
reviewers?
##########
core/src/test/java/org/apache/iceberg/io/TestByteBufferInputStreams.java:
##########
@@ -35,6 +35,14 @@ public abstract class TestByteBufferInputStreams {
protected abstract void checkOriginalData();
+ private static void assertAtEOF(ByteBufferInputStream stream) throws
IOException {
Review Comment:
Nice helper — the idempotency check is exactly the regression catch we want.
One small addition to consider: assert `read(byte[])` returns `-1` too, so the
helper pins both overloads at a single site. The multi-byte path already
returns `-1` today, but bundling it means a future change to either overload
trips the same assertion.
##########
core/src/main/java/org/apache/iceberg/io/SingleBufferInputStream.java:
##########
@@ -58,7 +58,7 @@ public long getPos() {
@Override
public int read() throws IOException {
if (!buffer.hasRemaining()) {
- throw new EOFException();
+ return -1;
Review Comment:
Minor: the `import java.io.EOFException;` at the top of this file is likely
unused now (no `throw new EOFException()` remains). Same check for
`MultiBufferInputStream.java` and for `import java.io.EOFException` in
`TestByteBufferInputStreams.java` — the
`assertThatThrownBy(...EOFException.class)` assertion is gone. Please drop the
unused imports.
##########
core/src/test/java/org/apache/iceberg/io/TestByteBufferInputStreams.java:
##########
@@ -532,4 +540,11 @@ public void testMarkDoubleReset() throws Exception {
.isInstanceOf(IOException.class)
.hasMessageStartingWith("No mark defined");
}
+
+ @Test
+ public void testEmptyStream() throws Exception {
+ assertAtEOF(ByteBufferInputStream.wrap(ByteBuffer.allocate(0)));
+ assertAtEOF(ByteBufferInputStream.wrap(ByteBuffer.allocate(0),
ByteBuffer.allocate(0)));
Review Comment:
Good coverage of the three construction shapes. Worth one more: a
`MultiBufferInputStream` built from a non-empty buffer list, then fully drained
— that exercises the `nextBuffer()` → `return -1` path at line 275, which is a
structurally distinct branch from the `current == null` path at line 266.
`testReadByte()` covers it for one shape, but a tiny dedicated
drained-multi-buffer test pins the second branch explicitly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]