kinolaev commented on PR #15712: URL: https://github.com/apache/iceberg/pull/15712#issuecomment-4127660087
@danielcweeks , I've double checked production logs, I was wrong, in production I had no problem with the connection pool exhausted by data file connections, the timeouts were always caused by ManifestFilterManager https://github.com/apache/iceberg/pull/15713. The problem this PR addresses I only encountered locally, when I tried to reproduce ManifestFilterManager's issue by reducing max-connections. https://github.com/apache/iceberg/pull/15713 was also caused by invalid configuration: thread count * 2 > connection pool size. I run spark in kubernetes, and although I set spark.executor.cores, it isn't used for the executor's resources.limits.cpu. That is why I had too many threads for the ManifestFilterManager.filterManifests call, because worker pool size was based on node cpu count instead of container cpu count https://github.com/apache/iceberg/blob/apache-iceberg-1.10.1/core/src/main/java/org/apache/iceberg/SystemConfigs.java#L33-L54. Without any patches 50 connections should be enough for 4 cpu cores: 4 tasks in parallel, 4 threads in worker pool, 16 threads in delete worker pool. In the worst case each thread opens 2 simultaneous connections, that's 48 connections in total. I still think that loading delete files before a data file is the right thing. But, yes, it doesn't fix anything or significantly reduce resource usage in a properly configured setup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
