kinolaev opened a new pull request, #15713:
URL: https://github.com/apache/iceberg/pull/15713
This PR prevents deadlocks while filtering manifest entries. The problem is
`ManifestFilterManager.filterManifest` in some cases reads each manifest twice:
first in `manifestHasDeletedFiles` and then in
`filterManifestWithDeletedFiles`. If `manifestHasDeletedFiles` returns in the
middle of entries iterable, the underlying connection is open until the
ManifestReader is closed. `filterManifest` method is called for all manifests
in parallel. When number of simultaneous connections is limited (for example by
http-client.apache.max-connections) it can lead to a deadlock because all
connections are held by `manifestHasDeletedFiles`.
The problem can be reproduced using spark-sql with S3FileIO and
`http-client.apache.max-connections=1`:
```sql
create table manifestfiltermanager(id bigint)
partitioned by (truncate(1, id))
tblproperties('write.delete.mode'='merge-on-read');
-- create a data manifest with 100 entries
insert into manifestfiltermanager select id from range(100);
-- create a delete manifest with 50 entries
delete from manifestfiltermanager where id in (select id from range(0, 100,
2));
-- make delete files dangling (fails without
https://github.com/apache/iceberg/pull/15712)
call system.rewrite_data_files('manifestfiltermanager', options =>
map('rewrite-all', 'true'));
-- ManifestFilterManager.manifestHasDeletedFiles reads first block of the
delete manifest
-- and returns true before the end of liveEntries() iterable without closing
it.
-- ManifestFilterManager.filterManifestWithDeletedFiles fails with
ConnectionPoolTimeoutException
call system.rewrite_data_files('manifestfiltermanager', options =>
map('rewrite-all', 'true'));
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]