scwhittle commented on issue #28776:
URL: https://github.com/apache/beam/issues/28776#issuecomment-2051652566

   I believe this is a long-standing bug within the python sdk. Side inputs 
within the global window are cached in 
[PerWindowInvoker](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L948)
 without respecting the side input cache token.
   
   This is part of the bundle processor which is reused across bundles. The 
side input values are otherwise attempted with 
[reset](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/bundle_processor.py#L507)
 here or by the runner by modifying the side input cache token.
   
   Since bundle procesors are cached as long as there is a steady rate of input 
so that the last accessed time is less than 60 seconds 
[here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py#L612),
 this can lead to extended periods where the captured global side input value 
is used without refresh.
   
   I think that we should remove the caching at the invoker level as it does 
not respect the cache token and the StateBackedSideInput supports caching 
itself.  This may be a performance regression as the state cache is currently 
disabled by default though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to