shunping opened a new pull request, #38620:
URL: https://github.com/apache/beam/pull/38620

   Currently, `_SharedCache` registers all live owners as joint owners of every 
cached subprocess key. This works fine when all owners are within a unified 
context (e.g. a test class wrapping multiple runs), but it causes unintended 
resource sharing/leakage when there are concurrent, independent, non-context 
owners (such as a long-lived runner).
   
   Here is a resource leak scenario:
   1. The Prism runner starts and registers a persistent non-context owner 
`owner_1`.
   1. The pipeline requests an external transform, spinning up a short-lived 
Java Expansion Service that registers its own owner `owner_2` and requests its 
startup command key.
   1. In the old implementation, `get()` automatically added all currently 
registered live owners to the cache key's owner set.
   
https://github.com/apache/beam/blob/361c2c473a8eb6ebddfc3211f78553b521fdfa54/sdks/python/apache_beam/utils/subprocess_server.py#L117-L118
   As a result, `owner_1` (the Prism runner) was registered as a joint owner of 
the short-lived Expansion Service's cache key.
   1. When the pipeline finished using the external transform and called 
purge(`owner_2`), the short-lived Expansion Service subprocess was never 
terminated because `owner_1` was still registered as an owner (`entry.owners` 
is not empty).
   
https://github.com/apache/beam/blob/361c2c473a8eb6ebddfc3211f78553b521fdfa54/sdks/python/apache_beam/utils/subprocess_server.py#L104-L105
   1. This resulted in orphaned, zombie Expansion Service processes 
accumulating in the background, binding up localhost ports and exhausting 
resources.
   
   This PR refactors `_SharedCache` in `subprocess_server.py` to explicitly 
distinguish between context owners (e.g., cache_subprocesses which should hold 
ownership over everything created under its block) and non-context owners 
(e.g., individual subprocess server instances like PrismRunner or expansion 
service subprocesses).
   
   Additionally, get() now takes an optional owner arg. Context owners 
automatically own all created keys, while independent non-context owners only 
own the keys they explicitly request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to