lirui-apache commented on PR #4596: URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1136984014
Hey @szehon-ho , your understanding about the issue is correct. We did some test of iterating all manifest entries and compute aggregated stats for each partition. We tried various queue sizes ranging from 5 to 10000. In our test the consumer is pretty fast, and even the smallest queue doesn't affect the e2e latency of the job. The result might be different in other use cases where the consumer is not fast enough, but my hunch is that such job latency is bounded by consumer anyway. One problem I can think of is when we plan files for multiple tables concurrently. And if one of the consumers is slow, it might block all the threads in the thread pool and prevent other jobs from making progress. We're investigating how to limit the resource used by each job. In production env, we do have a background service to rewrite manifest periodically. But such optimization is asynchronous, which means if users query the table before the rewrite is done, it can still cause OOM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
