I think your understanding is correct. Does the CommitOffset transform have side-effects on your pipeline?
On Tue, Dec 8, 2020 at 3:35 PM Vincent Marquez <[email protected]> wrote: > > *~Vincent* > > > On Tue, Dec 8, 2020 at 3:13 PM Boyuan Zhang <[email protected]> wrote: > >> Please note that each record output from ReadFromKafkaDoFn is in a >> GlobalWindow. The workflow is: >> ReadFromKafkaDoFn -> Reshuffle -> Window.into(FixedWindows) -> >> Max.longsPerKey -> CommitDoFn >> | >> ---> downstream consumers >> >> but won't there still be 5 commits that happen as fast as possible for >>> each of the windows that were constructed from the initial fetch? >> >> I'm not sure what you mean here. Would you like to elaborate more on your >> questions? >> > > Sure, I'll try to explain, it's very possible I just am misunderstanding > Windowing here. > > Assumption 1: Windowing works on the output timestamp. > Assumption 2: Max.longsPerKey will fire as fast as it can, in other > words, there is no throttling. > > So, if we have a topic that has the following msgs: > msg | timestamp (mm,ss) > ----------------------- > A | 01:00 > B | 01:01 > D | 06:00 > E | 06:04 > F | 12:02 > > and we read them all at once, we will have one window that contains [A,B] > and another one that has [D,E], and a third that has [F]. Once we get the > max offset for all three, won't they fire back to back without delay? So F > will fire as soon as E is finished committing, which fires immediately > after B is committed? > >
