Breaking change is when a portable runner (e.g. UW) running on a newer version, starts to add an extension to WindowedValue which the old SDK hasn't expected. Old SDK only expects FIRST, ONE_INDEX or TWO_INDICES, without an extension bit.
Radek On Fri, Apr 25, 2025 at 4:33 PM Kenneth Knowles <k...@apache.org> wrote: > I love using one WITH_EXTENSIONS bit and then adding a proto. Especially > for full WindowedValue where there is already ~10-12 bytes overhead, it > doesn't make sense to create our own extensible encoding protocol. > > - I think we can actually do this without a breaking change, too. Which > part were you thinking would be a breaking change? > - Regarding "being a pain" to change, I think adding fields to > WindowedValue and adding fields to the extension *should* be about the same > code difficulty. In both cases the new fields must be optional and have > nulls or defaults. Putting it in a new object makes sense, though, since > it'll be encoded separately with proto, which is way easier and more robust. > > Anyhow +100 to this new directly for the idea. There are even more > extensions that I've been thinking about and this makes it very flexible. > > Kenn > > On Fri, Apr 25, 2025 at 9:17 AM Radek Stankiewicz <radosl...@google.com> > wrote: > >> hi Kenn, >> I would add option 3 >> Option 3: Use current way of using tag, add drain in additional proto >> object >> >> - >> >> The tag & 0b0111 indicates whether it is using FIRST, ONE_INDEX or >> TWO_INDICES encoding and future options if needed. >> - >> >> tag & 0b10000 ("WITH_EXTENSION" ) indicates there is extension. >> - >> >> If extension is set, an additional proto object is passed. >> - >> >> Proto provides natural way of versioning and allows adding more >> fields in the future >> - >> >> No extension bit unspecified: >> - >> >> Drain mode unspecified >> - >> >> Tracing Context is null >> - >> >> Extension bit set: >> - >> >> Drain mode can be set >> - >> >> Tracing context and trace state can be set >> - >> >> Extension bit set, some future fields are added: >> - >> >> etc. >> >> The Extension object within WindowedValue (WV) shouldn’t be accessed >> directly by the user, ideally when new field is added, e.g. drainMode, WV >> should have added getter getDrainMode() and access directly enum from >> extension >> public WindowedValueExtensions.DrainMode getDrainMode(){ >> return extensions.getDrain(); >> } >> >> To set the value drain mode: >> windowedElem= >> windowedElem.withExtension(windowedElem.getExtensions().toBuilder().setDrain(WindowedValueExtensions.DrainMode. >> *DRAIN_MODE_DRAINING*).build()); >> or: >> public WindowedValue<T> withDrainMode(DrainMode dm){ >> return >> this.withExtension(windowedElem.getExtensions().toBuilder().setDrain(dm)); >> } >> >> Adding new fields to WindowedValue is a pain as it requires multiple >> changes all over the place, so doing it once with extensions field should >> simplify future fields. >> >> There may be situations where due to performance reasons storing value in >> extension is not optimal - e.g. tracing context as object is parsed array >> of strings but for interoperability it is usually propagated as two strings >> (traceparent, tracestate). In such case: >> >> - >> >> WV should have added field traceContext of type Context >> - >> >> WV should have added getter getTraceContext() and access field >> directly >> - >> >> WV should have added withTraceContext() method to construct WV with >> previous fields >> - >> >> When sending WV over the wire, the WV coder should encode >> traceContext into extensions object >> - >> >> When receiving WV over the wire, the WV coder should decode extension >> and construct optimal traceContext if traceparent and tracestate exists in >> extension. >> >> >> Summary >> Pros: >> >> - We are no longer limited with Pane bits as a way of extending >> WindowedValue >> - There is no overhead if none of the extended fields is set. >> >> Cons: >> >> - Memory overhead on keeping additional deserialized objects (here, >> trace context). >> - Confusion where is the field (wv.getContext() or >> wv.getExtensions().getTraceParent()) >> - Breaking change >> >> >> >> On Wed, Apr 2, 2025 at 11:17 PM Kenneth Knowles <k...@apache.org> wrote: >> >>> Just to organize my thoughts and to refresh myself on how this would >>> integrate with each SDK, I wrote this short follow-up on the API, PaneInfo >>> encoding, and proto changes. I'm especially interested in feedback from >>> people familiar with various SDKs about mistakes I've made in that SDK and >>> the most idiomatic ways to represent this. >>> >>> https://s.apache.org/beam-drain-mode >>> >>> Kenn >>> >>> On Tue, Mar 18, 2025 at 11:42 AM Kenneth Knowles <k...@apache.org> >>> wrote: >>> >>>> Thanks for all the detailed insightful comments! I've made some edits >>>> to reflect your insights and also settled on what I think is the most >>>> feasible plan, considering effort and backwards-compatibility. >>>> >>>> The work is actually now quite small (yay!). I've added these at the >>>> bottom of the doc: >>>> >>>> - Add "is drain" bit for aggregations >>>> - Add "is drain" bit for timer callbacks >>>> - Fire processing time timers instantly during drain (likely getting >>>> dropped because their window is expired) >>>> - Immediately drop already-expired processing time timers (since they >>>> are doomed to be dropped when they fire) >>>> >>>> This is a set of backwards-compatible changes consistent with "option >>>> 1" change proposal in the doc. >>>> >>>> Let me know if you think this is not enough or too much. >>>> >>>> Kenn >>>> >>>> On Wed, Mar 12, 2025 at 6:33 PM Jan Lukavský <je...@seznam.cz> wrote: >>>> >>>>> Hi Kenn, >>>>> >>>>> thanks for putting this down on paper. This is great initiative as it >>>>> might touch some core parts of the model we are actually somewhat circling >>>>> around. I left some comments and I'm looking forward to the broad >>>>> discussion this definitely deserves. >>>>> >>>>> Jan >>>>> On 3/11/25 15:46, Kenneth Knowles wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I've spent some time on the rather hairy details of timers, batch, >>>>> drain, and timer loops lately. We have some inconsistencies in this area >>>>> that regularly bite users. >>>>> >>>>> I've written up what I *think* are the desired semantics. some notes >>>>> about the current status of implementation, and a few proposals for how we >>>>> can improve things. >>>>> >>>>> https://s.apache.org/beam-timers-and-drain >>>>> >>>>> Please take a look! This stuff is hard and I could really use as many >>>>> smart pairs of eyes on this as possible. >>>>> >>>>> Kenn >>>>> >>>>>