Caveat: I want to emphasize that I don't have a specific proposal. I
haven't thought through enough details to consider a proposal, or you would
have seen it already :-)

On Sep 14, 2016 5:14 AM, "Aljoscha Krettek" <[email protected]> wrote:
>
> Hi,
> I had a chat with Kenn at Flink Forward and he did an off-hand remark
about
> how it might be better if triggers where not allowed to mark a window as
> finished

Yes. I've seen many cases of accidental closing and intentional uses of
closing that were wrong, but very rarely intentional uses of closing that
were good.

When we chatted, I think I mentioned this pair of situations, which I
restate for the benefit of the thread and the list:

1. If a trigger's predicate is never satisfied, then we will emit buffered
results at window expiration against the trigger's wishes.
2. If a trigger "finishes" then we throw away the rest of the data.

I don't really like fact #2. It causes silly bugs in user pipelines. I only
can think of one good use case, which is to approximate some custom notion
of completeness other than the watermark.

If you compare, we think of allowed lateness as expressing when to drop
data. We don't think of "AfterEndOfWindow" as expressing when to drop data.
But a trigger that finishes to express a custom idea of completeness is
like the latter. And moving triggers to a DSL instead of a UDF further
reduces the applicability.

I think a good solution for custom notions of completeness should probably
have all pieces: (a) the measure of completeness (b) triggering based on it
in whatever way makes sense (c) a way of noting special levels of
completeness like the ON_TIME pane does for the watermark, and (d) a policy
for eventually considering the output complete enough that we can drop
further input. So that is a lot of different things to design carefully.

I also want to point out that this is not against phase transitions, such
as moving from early firings to late firing when the watermark reaches the
end of the window. That is like a "OnceTrigger" but it is helpful IMO to
separate in my mind the event that we are interested in from a trigger for
controlling output based on that event.

> and instead always be "Repeatedly" (if I understood correctly).

I don't mean necessarily to have this automatically wrapped, but also to
design the trigger DSL so triggers are all/mostly well-behaved. For
example, EOW + withEarlyFirings + withLateFirings is well crafted to make
only sensible things easy.

> Maybe you (Kenn) could go a bit more in depth about what you meant by this
> and if we should actually change this in Beam. Would this mean that we
then
> have the opposite of Repeatedly, i.e. Once, or Only.once(T)?

This is exactly a thought I have had sometimes - make it only possible to
use a OnceTrigger as the top level expression if it is explicitly
requested. This would at least quickly prevent the common pitfall.

But the OnceTrigger / Trigger split is not quite right to enforce this
restriction. Instead of distinguishing "at most once" from "any number of
times", we need to distinguish "finishes" and "never finishes". Or a
vocabulary I have started to favor in my head is "Predicate" and "Trigger"
where you have OnceTrigger via something like Once.at(Predicate) and
otherwise every other trigger you can construct will never lose data. I
actually have a branch sitting around with an experiment along these lines,
I think...

> I also noticed some inconsistencies in when triggers behave as repeated
> triggers and once triggers. For example, AfterPane.elementCountAtLeast(5)
> only fires once if used alone but it it fires repeatedly if used as the
> speculative trigger in
> AfterWatermark.pastEndOfWindow().withEarlyFirings(...). (This is true for
> all "once" triggers.)

This is actually by design. The early & late firings are automatically
repeated. But it is a good example to think about: if a user writes
.trigger(AfterCount(n)) they probably don't mean only once, but are
expecting it to fire whenever the predicate is satisfied. So, using the
vocabulary I mentioned, this example seems to encourage making an overload
.triggering(Predicate) the same as
.triggering(Repeatedly.forever(Predicate)). We can separate the Java
overloads/API question from the model, of course.

Kenn

Reply via email to