laskoviymishka commented on code in PR #16167:
URL: https://github.com/apache/iceberg/pull/16167#discussion_r3249629785


##########
core/src/main/java/org/apache/iceberg/io/MultiBufferInputStream.java:
##########
@@ -263,7 +263,7 @@ public int read(byte[] bytes) {
   @Override
   public int read() throws IOException {
     if (current == null) {
-      throw new EOFException();
+      return -1;

Review Comment:
   The contract fix is right, but flagging this for visibility: I found four 
existing call sites that consume the result of `read()` without a `-1` guard 
and currently depend on the throw to fail loudly. After this change they fail 
silently — or hang, in one case:
   
   - `ValuesAsBytesReader.getByte()` (parquet) — casts to byte, `(byte) -1 = 
0xFF` → 8 spurious true bits in boolean decoding
   - `BaseVectorizedParquetValuesReader.readUnsignedVarInt()` (arrow) — `-1 & 
0x80 != 0` always true → **infinite loop**
   - `BaseVectorizedParquetValuesReader.readIntLittleEndian()` and 
`readIntLittleEndianPaddedOnBitWidth()` (arrow) — return garbage int instead of 
throwing
   
   These callers shouldn't reach EOF on well-formed input (Parquet decoders 
know exact byte counts), so this is defense-in-depth, not a happy-path 
regression. But the failure mode changes from loud to silent/hanging.
   
   Leaving the call to you — see options A/B/C in the top-level body. If you go 
with A (fix at callers), happy to look at the follow-up changes. Whatever you 
decide, could you note it in the PR description so it's visible to other 
reviewers?



##########
core/src/test/java/org/apache/iceberg/io/TestByteBufferInputStreams.java:
##########
@@ -35,6 +35,14 @@ public abstract class TestByteBufferInputStreams {
 
   protected abstract void checkOriginalData();
 
+  private static void assertAtEOF(ByteBufferInputStream stream) throws 
IOException {

Review Comment:
   Nice helper — the idempotency check is exactly the regression catch we want. 
One small addition to consider: assert `read(byte[])` returns `-1` too, so the 
helper pins both overloads at a single site. The multi-byte path already 
returns `-1` today, but bundling it means a future change to either overload 
trips the same assertion.



##########
core/src/main/java/org/apache/iceberg/io/SingleBufferInputStream.java:
##########
@@ -58,7 +58,7 @@ public long getPos() {
   @Override
   public int read() throws IOException {
     if (!buffer.hasRemaining()) {
-      throw new EOFException();
+      return -1;

Review Comment:
   Minor: the `import java.io.EOFException;` at the top of this file is likely 
unused now (no `throw new EOFException()` remains). Same check for 
`MultiBufferInputStream.java` and for `import java.io.EOFException` in 
`TestByteBufferInputStreams.java` — the 
`assertThatThrownBy(...EOFException.class)` assertion is gone. Please drop the 
unused imports.



##########
core/src/test/java/org/apache/iceberg/io/TestByteBufferInputStreams.java:
##########
@@ -532,4 +540,11 @@ public void testMarkDoubleReset() throws Exception {
         .isInstanceOf(IOException.class)
         .hasMessageStartingWith("No mark defined");
   }
+
+  @Test
+  public void testEmptyStream() throws Exception {
+    assertAtEOF(ByteBufferInputStream.wrap(ByteBuffer.allocate(0)));
+    assertAtEOF(ByteBufferInputStream.wrap(ByteBuffer.allocate(0), 
ByteBuffer.allocate(0)));

Review Comment:
   Good coverage of the three construction shapes. Worth one more: a 
`MultiBufferInputStream` built from a non-empty buffer list, then fully drained 
— that exercises the `nextBuffer()` → `return -1` path at line 275, which is a 
structurally distinct branch from the `current == null` path at line 266. 
`testReadByte()` covers it for one shape, but a tiny dedicated 
drained-multi-buffer test pins the second branch explicitly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to