openinx commented on PR #4596:
URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1144693457

   Actually, I'd prefer to give my +1 to the bound queue solution. Because: 
   
   1.  If there is an existing table which just has included too many manifests 
(and some of them just have many manifest entries),  then the approaches will 
just don't work ( such as merging metadata so that those manifest size is an 
idea size,  tuning the thread size etc).  We can do nothing in the real 
production environment unless we increment the heap size of spark driver or 
trino coordinator.   But what if we are not allowed to restart the spark driver 
& trino coordinator because of the other serving querying jobs ?
   
   2.  Does the blocking queue approach introduce any substantial performance 
bottleneck ?  If we think the default blocking queue size is a bit small,  then 
we can increase this default blocking queue size to 10000.  I think most of the 
cases won't be effected by the default blocking queue size, unless we have an 
extremely large table with so many manifest entries. But  in the case it seems 
to be easily OOM if we don't have any limited queue size.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to