[PR] Spark: preload delete files to avoid deadlocks [iceberg]

via GitHub Sat, 21 Mar 2026 08:49:19 -0700


kinolaev opened a new pull request, #15712:
URL: https://github.com/apache/iceberg/pull/15712


   In spark data file loading holds a connection while lazy delete file loading 
tries to acquire another. When number of simultaneous connections is limited 
(for example by `http-client.apache.max-connections`) it leads to a deadlock as 
soon as all connections are held by data files loading. To avoid the deadlock 
this PR adds delete file preloading to SparkDeleteFilter constructor.
   
   The problem can be reproduced using spark-sql with S3FileIO and 
`spark.sql.catalog.iceberg.http-client.apache.max-connections=1`:
   ```sql
   create table sparkdeletefilter(id bigint)
     tblproperties('write.delete.mode'='merge-on-read');
   -- create a data file
   insert into sparkdeletefilter select id from range(2);
   -- create a delete file
   delete from sparkdeletefilter where id in (select id from range(0, 2, 2));
   -- Reader opens the data file first, keeps it open and
   -- fails to load the delete file with ConnectionPoolTimeoutException
   select count(id) from sparkdeletefilter;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Spark: preload delete files to avoid deadlocks [iceberg]

Reply via email to