lirui-apache commented on PR #4596:
URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1136984014

   Hey @szehon-ho , your understanding about the issue is correct. We did some 
test of iterating all manifest entries and compute aggregated stats for each 
partition. We tried various queue sizes ranging from 5 to 10000. In our test 
the consumer is pretty fast, and even the smallest queue doesn't affect the e2e 
latency of the job. The result might be different in other use cases where the 
consumer is not fast enough, but my hunch is that such job latency is bounded 
by consumer anyway.
   One problem I can think of is when we plan files for multiple tables 
concurrently. And if one of the consumers is slow, it might block all the 
threads in the thread pool and prevent other jobs from making progress. We're 
investigating how to limit the resource used by each job.
   
   In production env, we do have a background service to rewrite manifest 
periodically. But such optimization is asynchronous, which means if users query 
the table before the rewrite is done, it can still cause OOM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to