wombatu-kun opened a new pull request, #16642:
URL: https://github.com/apache/iceberg/pull/16642

   ## Problem
   
   Follow-up to #16641, which fixed this class of bug for the Parquet write 
path (reported in #16640). When the Hadoop FileSystem cache is disabled (for 
example `fs.abfs.impl.disable.cache=true`), a `FileSystem` resolved for a write 
has no shared strong referrer and can be garbage-collected mid-write. On Azure, 
`AzureBlobFileSystem.finalize()` then shuts down the thread pool that the open 
`AbfsOutputStream` depends on, and the write fails with `Could not submit task 
to executor ... ThreadPoolExecutor [Terminated]`.
   
   ## Root cause
   
   `AvroFileAppender` keeps only the output stream, not the `OutputFile`. The 
data and delete writers that wrap it (`DataWriter`, `PositionDeleteWriter`, 
`EqualityDeleteWriter`) keep the appender and a location string, but not the 
`OutputFile` either. So for an Avro data or delete file written with the cache 
disabled, nothing keeps the write's `FileSystem` reachable, and it can be 
collected while the file is still being written.
   
   Manifests are not affected: `ManifestWriter` and `ManifestListWriter` retain 
the `OutputFile` themselves, so the `FileSystem` stays reachable through them. 
ORC is also unaffected because `OrcFileAppender` already retains its 
`OutputFile`.
   
   ## Change
   
   Retain the `OutputFile` on `AvroFileAppender` so its `FileSystem` stays 
reachable for the appender's lifetime, mirroring `OrcFileAppender`. The 
retained file is also used to include the file location in the write-error 
message.
   
   ## Tests
   
   Added `TestAvroWriteFileSystemReachability`, an end-to-end test that writes 
an Avro position-delete file through `PositionDeleteWriter` (which drops the 
`OutputFile`) with the Hadoop FileSystem cache disabled, against a local 
FileSystem that mimics `AzureBlobFileSystem`: its `finalize()` terminates a 
thread pool the open stream depends on, and the stream references the pool 
rather than the FileSystem. Without the production change the write FileSystem 
is collected mid-write and `close()` fails with `Could not submit task to 
executor: thread pool was terminated`; with the change the FileSystem stays 
reachable and the write completes. The test fails without the fix and passes 
with it.
   
   Related to #16640 and #16641.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to