kw2542 commented on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-879281777


   > Each Python SDK process instance is capable of running multiple work items 
in parallel already. The issue is that the Python GIL will limit it to use a 
single CPU core which is why multiple Python SDK process instances are 
launched. Whether they are launched by boot.go or someone else isn't too 
important. The prepare step sounds great for the external pool mode as well 
since that is what we want for docker for Apache Beam as well.
   > […](#)
   > On Mon, Jul 12, 2021 at 11:39 AM Ke Wu ***@***.***> wrote: I am curious 
why artifact staging does not work with threads? I wonder if we should fix that 
instead of introducing yet more complexity to this already complex API. In 
Python, I thought we used processes instead of threads because of the GIL. But 
Java has no GIL, so I'm not sure there is an advantage to using processes. 
Using threads still makes sense for IO bound tasks in Python since Python can 
parallelize IO effectively. Python's GIL is problematic for CPU bound tasks. 
@lukecwik <https://github.com/lukecwik> @ibzib <https://github.com/ibzib> 
Correct me if I am wrong, my understanding here is that we use process mode 
mainly because we can simplify the workflow by reusing the boot executable, 
which can only be executed in a sub process instead of thread. In addition, the 
boot executable starts the actual worker in a sub process too. It is true that 
we may implement a new workflow to support thread mode instead of relyin
 g boot executable but it could be much more significant work, let me know if 
you think it is worth the effort. In addition, I am wondering if we could add a 
prepare step in external pool mode, then we may not need to run artifact 
staging for each start worker request then. WDYT. — You are receiving this 
because you were mentioned. Reply to this email directly, view it on GitHub 
<[#15081 
(comment)](https://github.com/apache/beam/pull/15081#issuecomment-878505102)>, 
or unsubscribe 
<https://github.com/notifications/unsubscribe-auth/ACM4V3DCDIPDOMUTYY4IWT3TXMZF7ANCNFSM47JB6KQA>
 .
   
   Is your suggestion to stick with thread mode in Java and implement 
prepare/artifact staging separately from the existing boot script ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to