kbendick commented on PR #4596:
URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1148172108

   I do see the points you raise and have admittedly used a BlockingQueue in 
similar situations in large scale streaming ETL in the past (thinking in terms 
of manifests in the sort of “envelope” sense that is manifest lists and even 
just overall snapshot change set).
   
   Can we introduce a configuration parameter, with a blocking queue size? If 
we give the parameter a negative value (just like `CachingCatalog`s cache 
timeout milliseconds disables the cache when negative)? This way the user can 
avoid the queue or keep old behavior but also users interested in trying the 
BlockingQueue approach can do so (as it has served me well, particularly in 
streaming scenario where the pipeline cannot stop).
   
   This is also not unlike `streamResults` parameter for some spark driver side 
operations or akin to whether or not the worker thread pool is used in my 
opinion. So if we configure it in terms of size of queue, that’s the trade off 
users can more easily tune.
   
   Would this be achievable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to