kinolaev commented on PR #15712:
URL: https://github.com/apache/iceberg/pull/15712#issuecomment-4114594486
I see the issue in practice. I guess data files are loaded in parallel and
at least in case of S3FileIO the connection pool is shared. I'm sure that two
connections are not enough. I've just added two more inserts and deletes
```sql
create table sparkdeletefilter(id bigint)
tblproperties('write.delete.mode'='merge-on-read');
insert into sparkdeletefilter select id from range(0, 2);
insert into sparkdeletefilter select id from range(2, 4);
insert into sparkdeletefilter select id from range(4, 6);
delete from sparkdeletefilter where id in (select id from range(0, 2, 2));
delete from sparkdeletefilter where id in (select id from range(2, 4, 2));
delete from sparkdeletefilter where id in (select id from range(4, 6, 2));
select count(id) from sparkdeletefilter;
```
and about half the time I get ConnectionPoolTimeoutException with
`http-client.apache.max-connections=2`. The more operations I add the more
often it fails. It's a race condition I suppose. With the PR applied there is
no exception.
I don't know exactly how many data and delete files we need to lock 50
connections but I guess it's less then 1000.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]