kinolaev opened a new pull request, #15712:
URL: https://github.com/apache/iceberg/pull/15712
In spark data file loading holds a connection while lazy delete file loading
tries to acquire another. When number of simultaneous connections is limited
(for example by `http-client.apache.max-connections`) it leads to a deadlock as
soon as all connections are held by data files loading. To avoid the deadlock
this PR adds delete file preloading to SparkDeleteFilter constructor.
The problem can be reproduced using spark-sql with S3FileIO and
`spark.sql.catalog.iceberg.http-client.apache.max-connections=1`:
```sql
create table sparkdeletefilter(id bigint)
tblproperties('write.delete.mode'='merge-on-read');
-- create a data file
insert into sparkdeletefilter select id from range(2);
-- create a delete file
delete from sparkdeletefilter where id in (select id from range(0, 2, 2));
-- Reader opens the data file first, keeps it open and
-- fails to load the delete file with ConnectionPoolTimeoutException
select count(id) from sparkdeletefilter;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]