mr-brobot commented on PR #8015:
URL: https://github.com/apache/iceberg/pull/8015#issuecomment-1635166281

   @Fokko After some more testing in other platforms, I'm of the opinion that 
multi-processing support should be considered separately from simply making 
PyIceberg work in serverless environments. 
   
   1. Making PyIceberg work in serverless is simple. Replacing usage of 
`multiprocessing.pool.ThreadPool` and related synchronization concepts with 
`concurrent.futures.ThreadPoolExecutor` and the corresponding synchronization 
concepts in `threading`.
   2. Making PyIceberg support multi-processing adds variability in behavior 
across platforms and would require more care when introducing future changes 
(e.g., ensuring that everything parallelized by an executor can be pickled).
   
   I'm going to close this and reopen a separate PR focused on (1). For (2), I 
will create a separate issue where we can dedicate attention to testing and 
possibly some benchmarks that prove the ROI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to