The GitHub Actions job "Required Checks" on 
texera.git/fix/5143-iceberg-closeable-leak has failed.
Run started by GitHub user mengw15 (triggered by mengw15).

Head commit for run:
b5135b1802fcb9a067a191f38b8db8eeec6689c5 / mengw15 
<[email protected]>
fix: close CloseableIterable owners in Iceberg read paths

`IcebergUtil.readDataFileAsIterator` and five sibling sites in
`IcebergDocument` shared the same anti-pattern:
`closeableIterable.iterator().asScala` returned a bare Scala iterator
with no reference to the parent `CloseableIterable`. Under `S3FileIO`
each call leaked one `S3InputStream` (kept open until GC) plus one
slot of the AWS SDK's default 50-slot Apache HTTP connection pool;
after ~50 reads any new S3 read blocked indefinitely on
`acquireConnection` until JVM restart.

Introduce `CloseableScalaIterator[T]` (`Iterator[T] with AutoCloseable`,
idempotent `close()`) that wraps a `CloseableIterable[T]` and
propagates `close()` to the parent. Change `readDataFileAsIterator`
to return this wrapper. Update the `IcebergDocument` read iterator
to track the close handle in a sibling `AutoCloseable` field (needed
because `Iterator.drop(n)` returns a bare iterator that loses the
wrapper type) and close it on file-switch, on exhaustion, and on
caller-imposed `until` cap. Wrap the four eagerly-consumed
`planFiles()` call sites (`getCount`, `seekToUsableFile`,
`getTableStatistics`, `asInputStream`) in `Using.resource` so the
metadata-side `CloseableIterable<FileScanTask>` is released promptly.

Known limitation out of scope here: if a caller of
`IcebergDocument.get()` / `getRange()` / `getAfter()` stops iterating
before `hasNext` returns `false`, the LAST file's wrapper still
leaks until GC. Fixing that requires changing the public `Iterator[T]`
return type on `VirtualDocument` to `Iterator[T] with AutoCloseable`
and updating every caller — best done as a separate refactor.

Closes #5143.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Report URL: https://github.com/apache/texera/actions/runs/26275014407

With regards,
GitHub Actions via GitBox

Reply via email to