I think Kenn's second option accurately reflects my memory of the original intentions:
1. I remember we we considered either using the Future interface or calling the ReadableState interface a future, and explicitly said "no, future implies asynchrony and that the value returned by `get` won't change over multiple calls, but we want the latest value each time". So, I remember us explicitly considering and rejecting Future, thus the name "ReadableState". 2. The intuition behind the implementation was analogous to a mutable-reference cell in languages like ML / Scheme / etc. The ReadableState is just a pointer to the the reference cell. Calling read returns the value currently in the cell. If we have 100 ReadableStates pointing at the same cell, they all get the same value regardless of when they were created. This avoids needing to duplicate/snapshot values at any point in time. 3. ReadLater was added, as noted by Charles, to suggest prefetching the associated value. This was added after benchmarks showed 10x (if I remember correctly) performance improvements in things like GroupAlsoByWindows by minimizing round-trips asking for more state. The intuition being -- if we need to make an RPC to load one state value, we are better off making an RPC to load all the values we need. Overall, I too lean towards maintaining the second interpretation since it seems to be consistent and I believe we had additional reasons for preferring it over futures. Given the confusion, I think strengthening the class documentation makes sense -- I note the only hint of the current behavior is that ReadableState indicates it gets the *current* value (emphasis mine). We should emphasize that and perhaps even mention that the ReadableState should be understood as just a reference or handle to the underlying state, and thus its value will reflect the latest write. Charles, if it helps, the plan I remember regarding prefetching was something like: interface ReadableMapState<K, V> { ReadableState<V> get(K key); ReadableState<Iterable<V>> getIterable(); ReadableState<Map<K, V>> get(); // ... more things ... } Then prefetching a value is `mapState.get(key).readLater()` and prefetching the entire map is `mapState.get().readLater()`, etc. On Wed, May 23, 2018 at 7:13 PM Charles Chen <c...@google.com> wrote: > Thanks Kenn. I think there are two issues to highlight: (1) the API > should allow for some sort of prefetching / batching / background I/O for > state; and (2) it should be clear what the semantics are for reading (e.g. > so we don't have confusing read after write behavior). > > The approach I'm leaning towards for (1) is to allow a state.prefetch() > method (to prefetch a value, iterable or [entire] map state) and maybe > something like state.prefetch_key(key) to prefetch a specific KV in the > map. Issue (2) seems to be okay in either of Kenn's positions. > > On Wed, May 23, 2018 at 5:33 PM Robert Bradshaw <rober...@google.com> > wrote: > >> Thanks for laying this out so well, Kenn. I'm also leaning towards the >> second option, despite its drawbacks. (In particular, readLater should >> not influence what's returned at read(), it's just a hint.) >> >> On Wed, May 23, 2018 at 4:43 PM Kenneth Knowles <k...@google.com> wrote: >> >>> Great idea to bring it to dev@. I think it is better to focus here than >>> long doc comment threads. >>> >>> I had strong opinions that I think were a bit confused and wrong. Sorry >>> for that. I stated this position: >>> >>> - XYZState class is a handle to a mutable location >>> - its methods like isEmpty() or contents() should return immutable >>> future values (implicitly means their contents are semantically frozen when >>> they are created) >>> - the fact that you created the future is a hint that all necessary >>> fetching/computation should be kicked off >>> - later forced with get() >>> - when it was designed, pure async style was not a viable option >>> >>> I see now that the actual position of some of its original designers is: >>> >>> - XYZState class is a view on a mutable location >>> - its methods return new views on that mutable location >>> - calling readLater() is a hint that some fetching/computation should >>> be kicked off >>> - later read() will combine whatever readLater() did with additional >>> local info to give the current value >>> - async style not applicable nor desirable as per Beam's focus on naive >>> straight-line coding + autoscaling >>> >>> These are both internally consistent I think. In fact, I like the second >>> perspective better than the one I have been promoting. There are some >>> weaknesses: readLater() is pretty tightly coupled to a particular >>> implementation style, and futures are decades old so you can get good APIs >>> and performance without inventing anything. But I still like the non-future >>> version a little better. >>> >>> Kenn >>> >>> On Wed, May 23, 2018 at 4:05 PM Charles Chen <c...@google.com> wrote: >>> >>>> During the design of the Beam Python State API, we noticed some >>>> transactionality inconsistencies in the existing Beam Java State API (these >>>> are the unresolved bugs BEAM-2980 >>>> <https://issues.apache.org/jira/browse/BEAM-2980> and BEAM-2975 >>>> <https://issues.apache.org/jira/browse/BEAM-2975>). We are therefore >>>> having a discussion about this API which can have implications for its >>>> future development in all Beam languages: >>>> https://docs.google.com/document/d/1GadEkAmtbJQjmqiqfSzGw3b66TKerm8tyn6TK4blAys/edit#heading=h.ofyl9jspiz3b >>>> >>>> If you have an opinion on the possible design approaches, it would be >>>> very helpful to bring up in the doc or on this thread. Thanks! >>>> >>>> Best, >>>> Charles >>>> >>>