lirui-apache commented on PR #4596: URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1138096307
Let me clarify our use case. We have an iceberg table partitioned by date, and we run an ETL job every day to sync data from an upstream hive table into this iceberg table. The ETL job basically just runs an `INSERT INTO` with SparkSQL which adds a new partition to the iceberg table. So we end up having a manifest for each partition, and each partition has lots of data files, i.e. ranging roughly from 50k to 130k. Then we have a trino cluster where users submit ad-hoc queries. This cluster is multi-tenant and not just meant for the iceberg table mentioned above. The problem we faced was that querying this huge iceberg table can easily make the trino coordinator unstable or even crash with OOM. I'll check whether `commit.manifest.target-size-byte` can mitigate our case. But IIUC, the iceberg worker pool is static and shared among all jobs. So we probably won't want to change the pool size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
