Caveat: I want to emphasize that I don't have a specific proposal. I haven't thought through enough details to consider a proposal, or you would have seen it already :-)
On Sep 14, 2016 5:14 AM, "Aljoscha Krettek" <[email protected]> wrote: > > Hi, > I had a chat with Kenn at Flink Forward and he did an off-hand remark about > how it might be better if triggers where not allowed to mark a window as > finished Yes. I've seen many cases of accidental closing and intentional uses of closing that were wrong, but very rarely intentional uses of closing that were good. When we chatted, I think I mentioned this pair of situations, which I restate for the benefit of the thread and the list: 1. If a trigger's predicate is never satisfied, then we will emit buffered results at window expiration against the trigger's wishes. 2. If a trigger "finishes" then we throw away the rest of the data. I don't really like fact #2. It causes silly bugs in user pipelines. I only can think of one good use case, which is to approximate some custom notion of completeness other than the watermark. If you compare, we think of allowed lateness as expressing when to drop data. We don't think of "AfterEndOfWindow" as expressing when to drop data. But a trigger that finishes to express a custom idea of completeness is like the latter. And moving triggers to a DSL instead of a UDF further reduces the applicability. I think a good solution for custom notions of completeness should probably have all pieces: (a) the measure of completeness (b) triggering based on it in whatever way makes sense (c) a way of noting special levels of completeness like the ON_TIME pane does for the watermark, and (d) a policy for eventually considering the output complete enough that we can drop further input. So that is a lot of different things to design carefully. I also want to point out that this is not against phase transitions, such as moving from early firings to late firing when the watermark reaches the end of the window. That is like a "OnceTrigger" but it is helpful IMO to separate in my mind the event that we are interested in from a trigger for controlling output based on that event. > and instead always be "Repeatedly" (if I understood correctly). I don't mean necessarily to have this automatically wrapped, but also to design the trigger DSL so triggers are all/mostly well-behaved. For example, EOW + withEarlyFirings + withLateFirings is well crafted to make only sensible things easy. > Maybe you (Kenn) could go a bit more in depth about what you meant by this > and if we should actually change this in Beam. Would this mean that we then > have the opposite of Repeatedly, i.e. Once, or Only.once(T)? This is exactly a thought I have had sometimes - make it only possible to use a OnceTrigger as the top level expression if it is explicitly requested. This would at least quickly prevent the common pitfall. But the OnceTrigger / Trigger split is not quite right to enforce this restriction. Instead of distinguishing "at most once" from "any number of times", we need to distinguish "finishes" and "never finishes". Or a vocabulary I have started to favor in my head is "Predicate" and "Trigger" where you have OnceTrigger via something like Once.at(Predicate) and otherwise every other trigger you can construct will never lose data. I actually have a branch sitting around with an experiment along these lines, I think... > I also noticed some inconsistencies in when triggers behave as repeated > triggers and once triggers. For example, AfterPane.elementCountAtLeast(5) > only fires once if used alone but it it fires repeatedly if used as the > speculative trigger in > AfterWatermark.pastEndOfWindow().withEarlyFirings(...). (This is true for > all "once" triggers.) This is actually by design. The early & late firings are automatically repeated. But it is a good example to think about: if a user writes .trigger(AfterCount(n)) they probably don't mean only once, but are expecting it to fire whenever the predicate is satisfied. So, using the vocabulary I mentioned, this example seems to encourage making an overload .triggering(Predicate) the same as .triggering(Repeatedly.forever(Predicate)). We can separate the Java overloads/API question from the model, of course. Kenn
