kbendick commented on PR #4596: URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1148172108
I do see the points you raise and have admittedly used a BlockingQueue in similar situations in large scale streaming ETL in the past (thinking in terms of manifests in the sort of “envelope” sense that is manifest lists and even just overall snapshot change set). Can we introduce a configuration parameter, with a blocking queue size? If we give the parameter a negative value (just like `CachingCatalog`s cache timeout milliseconds disables the cache when negative)? This way the user can avoid the queue or keep old behavior but also users interested in trying the BlockingQueue approach can do so (as it has served me well, particularly in streaming scenario where the pipeline cannot stop). This is also not unlike `streamResults` parameter for some spark driver side operations or akin to whether or not the worker thread pool is used in my opinion. So if we configure it in terms of size of queue, that’s the trade off users can more easily tune. Would this be achievable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
