kw2542 commented on pull request #15081: URL: https://github.com/apache/beam/pull/15081#issuecomment-878505102
> > I am curious why artifact staging does not work with threads? I wonder if we should fix that instead of introducing yet more complexity to this already complex API. > > In Python, I thought we used processes instead of threads because of the GIL. But Java has no GIL, so I'm not sure there is an advantage to using processes. > > Using threads still makes sense for IO bound tasks in Python since Python can parallelize IO effectively. Python's GIL is problematic for CPU bound tasks. @lukecwik @ibzib Correct me if I am wrong, my understanding here is that we use process mode mainly because we can simplify the workflow by reusing the boot executable, which can only be executed in a sub process instead of thread. In addition, the boot executable starts the actual worker in a sub process too. It is true that we may implement a new workflow to support thread mode instead of relying boot executable but it could be much more significant work, let me know if you think it is worth the effort. In addition, I am wondering if we could add a prepare step in external pool mode, then we may not need to run artifact staging for each start worker request then. WDYT. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
