Re: [PR] Spark: preload delete files to avoid deadlocks [iceberg]

via GitHub Wed, 25 Mar 2026 08:44:23 -0700


kinolaev commented on PR #15712:
URL: https://github.com/apache/iceberg/pull/15712#issuecomment-4127660087

@danielcweeks , I've double checked production logs, I was wrong, in
production I had no problem with the connection pool exhausted by data file
connections, the timeouts were always caused by ManifestFilterManager
https://github.com/apache/iceberg/pull/15713. The problem this PR addresses I
only encountered locally, when I tried to reproduce ManifestFilterManager's
issue by reducing max-connections.
https://github.com/apache/iceberg/pull/15713 was also caused by invalid
configuration: thread count * 2 > connection pool size. I run spark in
kubernetes, and although I set spark.executor.cores, it isn't used for the
executor's resources.limits.cpu. That is why I had too many threads for the
ManifestFilterManager.filterManifests call, because worker pool size was based
on node cpu count instead of container cpu count
https://github.com/apache/iceberg/blob/apache-iceberg-1.10.1/core/src/main/java/org/apache/iceberg/SystemConfigs.java#L33-L54.
Without any patches 50 connections should be enough for 4 cpu cores: 4 tasks
in parallel, 4 threads in worker pool, 16 threads in delete worker pool. In the
worst case each thread opens 2 simultaneous connections, that's 48 connections
in total.
I still think that loading delete files before a data file is the right
thing. But, yes, it doesn't fix anything or significantly reduce resource usage
in a properly configured setup.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: preload delete files to avoid deadlocks [iceberg]

Reply via email to