I think your understanding is correct. Does the CommitOffset transform have
side-effects on your pipeline?

On Tue, Dec 8, 2020 at 3:35 PM Vincent Marquez <[email protected]>
wrote:

>
> *~Vincent*
>
>
> On Tue, Dec 8, 2020 at 3:13 PM Boyuan Zhang <[email protected]> wrote:
>
>> Please note that each record output from ReadFromKafkaDoFn is in a
>> GlobalWindow. The workflow is:
>> ReadFromKafkaDoFn -> Reshuffle -> Window.into(FixedWindows) ->
>> Max.longsPerKey -> CommitDoFn
>>                                                |
>>                                                ---> downstream consumers
>>
>> but won't there still be 5 commits that happen as fast as possible for
>>> each of the windows that were constructed from the initial fetch?
>>
>> I'm not sure what you mean here. Would you like to elaborate more on your
>> questions?
>>
>
> Sure, I'll try to explain, it's very possible I just am misunderstanding
> Windowing here.
>
> Assumption 1:  Windowing works on the output timestamp.
> Assumption 2:  Max.longsPerKey will fire as fast as it can, in other
> words, there is no throttling.
>
> So, if we have a topic that has the following msgs:
> msg | timestamp (mm,ss)
> -----------------------
>    A  |  01:00
>    B  |  01:01
>    D  |  06:00
>    E  |  06:04
>    F  |  12:02
>
> and we read them all at once, we will have one window that contains [A,B]
> and another one that has [D,E], and a third that has [F].  Once we get the
> max offset for all three, won't they fire back to back without delay? So F
> will fire as soon as E is finished committing, which fires immediately
> after B is committed?
>
>

Reply via email to