phoerious edited a comment on pull request #16658:
URL: https://github.com/apache/beam/pull/16658#issuecomment-1047144100


   @ryanthompson591 @tvalentyn I updated the PR. The venvs are now using random 
names and are bound to the workers, which is the only way to make this safe.
   
   I also fixed how workers are cleaned up. Previously, they were simply 
SIGKILL'ed by the worker pool Python executable, which prevented any kind of 
clean up and also caused zombie processes inside the containers. I think there 
are also still some cases where processes are not cleaned up properly and just 
keep running forever, but most of that should be fixed now. Processes that keep 
running forever happen particularly when I'm using a global CombineFn, which 
causes Flink to believe that the last remaining worker is still running even 
though it has long finished. When that happens, not even cancelling the job 
will send signals to the remaining workers. But that's another bug (I reported 
that before on the mailing list, but never got a response).
   
   All of this needs some more testing, but it seems to be running fine on my 
Flink cluster at least.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to