User state is built on top of read, append and clear and not off a read and write paradigm to allow for blind appends.
The optimization you speak of can be done completely inside the SDK without any additional protocol being required as long as you clear the state first and then append all your new data. The Beam Java SDK does this for all runners when executed portably. You could port the same logic to the Beam Python SDK as well. 1: https://github.com/apache/beam/blob/41478d00d34598e56471d99d0845ac16efa5b8ef/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BagUserState.java#L84 On Tue, Jul 16, 2019 at 5:54 AM Robert Bradshaw <rober...@google.com> wrote: > Python workers also have a per-bundle SDK-side cache. A protocol has > been proposed, but hasn't yet been implemented in any SDKs or runners. > > On Tue, Jul 16, 2019 at 6:02 AM Reuven Lax <re...@google.com> wrote: > > > > It's runner dependent. Some runners (e.g. the Dataflow runner) do have > such a cache, though I think it's currently has a cap for large bags. > > > > Reuven > > > > On Mon, Jul 15, 2019 at 8:48 PM Rakesh Kumar <rakeshku...@lyft.com> > wrote: > >> > >> Hi, > >> > >> I have been using python sdk for the application and also using > BagState in production. I was wondering whether state logic has any > write-through-cache implemented or not. If we are sending every read and > write request through network then it comes with a performance cost. We can > avoid network call for a read operation if we have write-through-cache. > >> I have superficially looked into the implementation and I didn't see > any cache implementation. > >> > >> is it possible to have this cache? would it cause any issue if we have > the caching layer? > >> >