wombatu-kun opened a new pull request, #16642: URL: https://github.com/apache/iceberg/pull/16642
## Problem Follow-up to #16641, which fixed this class of bug for the Parquet write path (reported in #16640). When the Hadoop FileSystem cache is disabled (for example `fs.abfs.impl.disable.cache=true`), a `FileSystem` resolved for a write has no shared strong referrer and can be garbage-collected mid-write. On Azure, `AzureBlobFileSystem.finalize()` then shuts down the thread pool that the open `AbfsOutputStream` depends on, and the write fails with `Could not submit task to executor ... ThreadPoolExecutor [Terminated]`. ## Root cause `AvroFileAppender` keeps only the output stream, not the `OutputFile`. The data and delete writers that wrap it (`DataWriter`, `PositionDeleteWriter`, `EqualityDeleteWriter`) keep the appender and a location string, but not the `OutputFile` either. So for an Avro data or delete file written with the cache disabled, nothing keeps the write's `FileSystem` reachable, and it can be collected while the file is still being written. Manifests are not affected: `ManifestWriter` and `ManifestListWriter` retain the `OutputFile` themselves, so the `FileSystem` stays reachable through them. ORC is also unaffected because `OrcFileAppender` already retains its `OutputFile`. ## Change Retain the `OutputFile` on `AvroFileAppender` so its `FileSystem` stays reachable for the appender's lifetime, mirroring `OrcFileAppender`. The retained file is also used to include the file location in the write-error message. ## Tests Added `TestAvroWriteFileSystemReachability`, an end-to-end test that writes an Avro position-delete file through `PositionDeleteWriter` (which drops the `OutputFile`) with the Hadoop FileSystem cache disabled, against a local FileSystem that mimics `AzureBlobFileSystem`: its `finalize()` terminates a thread pool the open stream depends on, and the stream references the pool rather than the FileSystem. Without the production change the write FileSystem is collected mid-write and `close()` fails with `Could not submit task to executor: thread pool was terminated`; with the change the FileSystem stays reachable and the write completes. The test fails without the fix and passes with it. Related to #16640 and #16641. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
