jbbqqf opened a new pull request, #16266:
URL: https://github.com/apache/iceberg/pull/16266

   ## Summary
   
   `OSSInputStream` (aliyun) and `EcsSeekableInputStream` (dell) had the same 
EOF-handling bug fixed in #16055 for `GCSInputStream` / `S3InputStream` / 
`ADLSInputStream`: when the underlying `read()` returns `-1`, the wrapper still 
advanced `pos` (and `next`, in OSS's case) and incremented the `readBytes` / 
`readOperations` metric counters by `-1`.
   
   This PR applies the same one-shape fix the maintainers picked in #16055 to 
the two modules left out of that PR, plus regression tests covering both 
overloads.
   
   Resolves #16062 for `aliyun` and `dell`. The remaining pieces (`GCS`, `S3`, 
`ADLS`) are still tracked by the in-flight #16055.
   
   ## Context
   
   The original issue body for #16062 explicitly enumerates the four affected 
files; #16055 covers `GCS`/`S3`/`ADLS` and the issue comment confirms 
`OSSInputStream` and `EcsSeekableInputStream` are still pending.
   
   In Iceberg's hot path, files are read via known-offset range reads rather 
than sequential reads-to-EOF, so the bug is *latent* in production today. It 
still matters because:
   
   1. Any future caller that does loop until `read() == -1` would silently 
corrupt `pos`/`next` and the metric counters.
   2. The behavior diverges from `InputStream`'s contract — every other Iceberg 
`*InputStream` already returns `-1` cleanly after #16055.
   3. The metric drift is observable today via `FileIOMetricsContext` snapshots 
that overshoot the file size.
   
   ## Changes
   
   - `aliyun/src/main/java/org/apache/iceberg/aliyun/oss/OSSInputStream.java`
     - `read()` and `read(byte[], int, int)` now check the underlying return 
value and, if `-1`, return immediately without mutating `pos` / `next` or 
incrementing `readBytes` / `readOperations`. Same pattern as #16055's 
`S3InputStream` change.
     - Inline comments explain the why and reference both #16055 (the canonical 
fix) and #16062 (this issue), so a reviewer reading the diff cold doesn't have 
to re-derive the reasoning.
   - 
`dell/src/main/java/org/apache/iceberg/dell/ecs/EcsSeekableInputStream.java` — 
same shape.
   - 
`aliyun/src/test/java/org/apache/iceberg/aliyun/oss/TestOSSInputStream.java` — 
adds `testReadSingleByteAtEof` and `testReadBufferAtEof`. Both run against the 
in-process `AliyunOSSMockExtension` (no live OSS access required).
   - 
`dell/src/test/java/org/apache/iceberg/dell/ecs/TestEcsSeekableInputStream.java`
 — same two tests against the existing `EcsS3MockRule`.
   
   All four new tests fail on `origin/main` and pass with this change.
   
   ## Reproduce BEFORE/AFTER yourself (copy-paste)
   
   ```bash
   # --- one-time setup ---
   git clone https://github.com/apache/iceberg.git /tmp/repro && cd /tmp/repro
   
   # --- BEFORE (origin/main) ---
   git checkout origin/main
   # Pull in only the new test methods so the regression is visible against
   # the unmodified production code in main:
   git fetch https://github.com/jbbqqf/iceberg.git 
feat/16062-fix-eof-handling-oss-ecs-streams
   git checkout FETCH_HEAD -- \
     aliyun/src/test/java/org/apache/iceberg/aliyun/oss/TestOSSInputStream.java 
\
     
dell/src/test/java/org/apache/iceberg/dell/ecs/TestEcsSeekableInputStream.java
   ./gradlew :iceberg-dell:test --tests 
"org.apache.iceberg.dell.ecs.TestEcsSeekableInputStream"
   ./gradlew :iceberg-aliyun:test --tests 
"org.apache.iceberg.aliyun.oss.TestOSSInputStream"
   # Expected: 2 failures in each test class —
   #   testReadSingleByteAtEof  (pos / readBytes shifted by -1 at EOF)
   #   testReadBufferAtEof      (same, for the buffered overload)
   
   # --- AFTER (this PR) ---
   git checkout FETCH_HEAD
   ./gradlew :iceberg-dell:test --tests 
"org.apache.iceberg.dell.ecs.TestEcsSeekableInputStream"
   ./gradlew :iceberg-aliyun:test --tests 
"org.apache.iceberg.aliyun.oss.TestOSSInputStream"
   # Expected: all tests green in both modules.
   ```
   
   The only thing that changes between BEFORE and AFTER is the checked-out ref 
of the production InputStream classes; the test files are the same in both runs.
   
   ## What I ran locally
   
   - `./gradlew :iceberg-dell:test --tests "...TestEcsSeekableInputStream"` on 
`origin/main` with the new tests imported → **6 tests, 2 failures** (regression 
visible).
   - Same on this branch → **6/6 pass**.
   - `./gradlew :iceberg-aliyun:test --tests "...TestOSSInputStream"` on 
`origin/main` with the new tests imported → **5 tests, 2 failures**.
   - Same on this branch → **5/5 pass**.
   - `./gradlew :iceberg-aliyun:spotlessCheck :iceberg-dell:spotlessCheck` → 
**PASS**.
   
   ## Edge cases tested
   
   | # | Module | Scenario | Expected | Verified by |
   |---|--------|----------|----------|-------------|
   | 1 | dell  | Single-byte `read()` past the last byte of a 4-byte object | 
returns `-1`; `getPos()` stays at file size; idempotent on a second call | 
`TestEcsSeekableInputStream#testReadSingleByteAtEof` |
   | 2 | dell  | Buffered `read(byte[], 0, len)` past EOF | returns `-1`; 
`getPos()` stays at file size | 
`TestEcsSeekableInputStream#testReadBufferAtEof` |
   | 3 | aliyun | Single-byte `read()` past EOF on an 8-byte object | returns 
`-1`; `getPos()` stays; idempotent | 
`TestOSSInputStream#testReadSingleByteAtEof` |
   | 4 | aliyun | Buffered `read(byte[], 0, dataSize+1)` followed by another 
EOF buffered read | first read returns `dataSize`, second returns `-1`; 
`getPos()` stays at `dataSize` | `TestOSSInputStream#testReadBufferAtEof` |
   
   ## Risk / blast radius
   
   - Two-line guard in two `*InputStream` classes, mirroring the GCS/S3/ADLS 
shape that was already chosen by maintainers in #16055.
   - No public API surface modified.
   - Behavioral change is strictly *more* contract-compliant: 
`InputStream.read()` is documented to return `-1` at EOF without side effects, 
and that is now what these wrappers do.
   - Metrics: `readBytes`/`readOperations` no longer drift on EOF reads. 
Pre-fix snapshots overcounted reads; post-fix they don't. This is observable 
but should be considered a fix, not a regression.
   
   ## Release note
   
   ```release-note
   Fix OSS (aliyun) and ECS (dell) InputStreams to return -1 at EOF without 
advancing the position counter or skewing read metrics, matching the 
GCS/S3/ADLS fix in #16055.
   ```
   
   ---
   
   *PR drafted with assistance from Claude Code. The change was reviewed 
manually against `apache/iceberg`'s source on `main` and against the 
corresponding fix shape in #16055. The reproducer block above was used during 
development; it is the same one a reviewer can paste verbatim.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to