Re: [DESIGN] Timers, Watermark Holds, Loops, Batch and Drain

Radek Stankiewicz via dev Fri, 25 Apr 2025 06:17:49 -0700

hi Kenn,
I would add option 3
Option 3:  Use current way of using tag, add drain in additional proto
object


   -

   The tag & 0b0111 indicates whether it is using FIRST, ONE_INDEX or
   TWO_INDICES encoding and future options if needed.
   -

   tag & 0b10000 ("WITH_EXTENSION" ) indicates there is extension.
   -

   If extension is set, an additional proto object is passed.
   -

   Proto provides natural way of versioning and allows adding more fields
   in the future
   -

      No extension bit unspecified:
      -

         Drain mode unspecified
         -

         Tracing Context is null
         -

      Extension bit set:
      -

         Drain mode can be set
         -

         Tracing context and trace state can be set
         -

      Extension bit set, some future fields are added:
      -

         etc.

The Extension object within WindowedValue (WV) shouldn’t be accessed
directly by the user, ideally when new field is added, e.g. drainMode, WV
should have added getter getDrainMode() and access directly enum from
extension
public WindowedValueExtensions.DrainMode getDrainMode(){
      return extensions.getDrain();
    }

To set the value drain mode:
windowedElem=
windowedElem.withExtension(windowedElem.getExtensions().toBuilder().setDrain(WindowedValueExtensions.DrainMode.
*DRAIN_MODE_DRAINING*).build());
or:
public WindowedValue<T> withDrainMode(DrainMode dm){
      return
this.withExtension(windowedElem.getExtensions().toBuilder().setDrain(dm));
}

Adding new fields to WindowedValue is a pain as it requires multiple
changes all over the place, so doing it once with extensions field should
simplify future fields.

There may be situations where due to performance reasons storing value in
extension is not optimal - e.g. tracing context as object is parsed array
of strings but for interoperability it is usually propagated as two strings
(traceparent, tracestate). In such case:

   -

   WV should have added field traceContext of type Context
   -

   WV should have added getter getTraceContext() and access field directly
   -

   WV should have added withTraceContext() method to construct WV with
   previous fields
   -

   When sending WV over the wire, the WV coder should encode traceContext
   into extensions object
   -

   When receiving WV over the wire, the WV coder should decode extension
   and construct optimal traceContext if traceparent and tracestate exists in
   extension.


Summary
Pros:

   - We are no longer limited with Pane bits as a way of extending
   WindowedValue
   - There is no overhead if none of the extended fields is set.

Cons:

   - Memory overhead on keeping additional deserialized objects (here,
   trace context).
   - Confusion where is the field (wv.getContext()  or
   wv.getExtensions().getTraceParent())
   - Breaking change



On Wed, Apr 2, 2025 at 11:17 PM Kenneth Knowles <k...@apache.org> wrote:

> Just to organize my thoughts and to refresh myself on how this would
> integrate with each SDK, I wrote this short follow-up on the API, PaneInfo
> encoding, and proto changes. I'm especially interested in feedback from
> people familiar with various SDKs about mistakes I've made in that SDK and
> the most idiomatic ways to represent this.
>
>    https://s.apache.org/beam-drain-mode
>
> Kenn
>
> On Tue, Mar 18, 2025 at 11:42 AM Kenneth Knowles <k...@apache.org> wrote:
>
>> Thanks for all the detailed insightful comments! I've made some edits to
>> reflect your insights and also settled on what I think is the most feasible
>> plan, considering effort and backwards-compatibility.
>>
>> The work is actually now quite small (yay!). I've added these at the
>> bottom of the doc:
>>
>>  - Add "is drain" bit for aggregations
>>  - Add "is drain" bit for timer callbacks
>>  - Fire processing time timers instantly during drain (likely getting
>> dropped because their window is expired)
>>  - Immediately drop already-expired processing time timers (since they
>> are doomed to be dropped when they fire)
>>
>> This is a set of backwards-compatible changes consistent with "option 1"
>> change proposal in the doc.
>>
>> Let me know if you think this is not enough or too much.
>>
>> Kenn
>>
>> On Wed, Mar 12, 2025 at 6:33 PM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Hi Kenn,
>>>
>>> thanks for putting this down on paper. This is great initiative as it
>>> might touch some core parts of the model we are actually somewhat circling
>>> around. I left some comments and I'm looking forward to the broad
>>> discussion this definitely deserves.
>>>
>>>  Jan
>>> On 3/11/25 15:46, Kenneth Knowles wrote:
>>>
>>> Hi all,
>>>
>>> I've spent some time on the rather hairy details of timers, batch,
>>> drain, and timer loops lately. We have some inconsistencies in this area
>>> that regularly bite users.
>>>
>>> I've written up what I *think* are the desired semantics. some notes
>>> about the current status of implementation, and a few proposals for how we
>>> can improve things.
>>>
>>> https://s.apache.org/beam-timers-and-drain
>>>
>>> Please take a look! This stuff is hard and I could really use as many
>>> smart pairs of eyes on this as possible.
>>>
>>> Kenn
>>>
>>>

Re: [DESIGN] Timers, Watermark Holds, Loops, Batch and Drain

Reply via email to