Should WindowFn have a mininal Duration?

Jan Lukavský Thu, 22 Apr 2021 02:48:58 -0700

Hi,

I have come across a "problem" while implementing some toy Pipeline. Iwould like to split input PCollection into two parts - droppable data(delayed for more than allowed lateness from the end of the window) fromthe rest. I will not go into details, as that is not relevant, theproblem is that I need to setup something like "looping timer" to beable to create state for a window, even when there is no data, yet (tobe able to setup timer for the end of a window, to be able to recognizedroppable data). I would like the solution to be generic, so I wouldlike to "infer" the duration of the looping timer from the inputPCollection. What I would need is to know a _minimal guaranteed durationof a window that a WindowFn can generate_. I would then setup thelooping timer to tick with interval of this minimal duration and thatwould guarantee the timer will hit all the windows.

I could try to infer this duration from the input windowing with somehackish ways - e.g. using some "instanceof" approach, or by using theWindowFn to generate set of windows for some fixed timestamp (withoutdata element) and then infer the time from maxTimestamp of the returnedwindows. That would probably break for sliding windows, because theresult would be the duration of the slide, not the duration of thewindow (at least when doing naive computation).

It seems to me, that all WindowFns have such a minimal Duration -obvious for Fixed Windows, but every other window type seems to havesuch property (including Sessions - that is the gap duration). The onlyproblem would be with data-driven windows, but we don't have currentlystrong support for these.

The question is then - would it make sense to introduceWindowFn.getMinimalWindowDuration() to the model? Default value could bezero, which would mean such WindowFn would be unsupported in mymotivating example.

Jan

Should WindowFn have a mininal Duration?

Reply via email to