Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

Aljoscha Krettek Thu, 10 Sep 2020 02:43:24 -0700

On 10.09.20 11:30, Dawid Wysakowicz wrote:

I am not sure about the option for ignoring the Triggers. Do you mean to
ignore all the Triggers including e.g. Flink's such as CountTrigger,
EventTimeTrigger etc.? Won't it effectively disable the WindowOperator
whatsoever. Or even worse make it unusable with ever growing state? I
might be wrong here but aren't Triggers required for emitting results
from WindowOperator? If I am correct we emit results only if a Trigger
returns FIRE from on of onElement, onEventTime, onProcessingTime. Why do
you think it does not work well with FAILing hard without this option?
We could fail hard e.g. if the WindowAssigner#isEventTime returns false.

The problem I'm trying to solve are mixed Triggers. Say you have aTrigger that does "fire when watermark passes maxTimestamp() but alsofire every 5 minutes in processing time and when the watermark passesmaxTimestamp() fire for every 5 new records". This is something that theBeam API for example allows users to specify and is something that Ithink is potentially valuable in the real world.

Ignoring Triggers would mean that we always fire on the maxTimestamp()by hardcoding this in a WindowOperator that we use for BATCH execution.With this, the WindowAssigner becomes the only thing that changes. Thisis similar to how Beam treats windows, where the WindowAssigner carriessemantic content but the Trigger is only for optimizing streamingemission, which you don't need for BATCH where you always have a"perfect watermark".

Coming back to the initial example, such a Trigger would not work if weFAIL hard for processing-time on BATCH, which I'm suggesting because weotherwise have potentially surprising results if business logic dependson processing-time timers. For Windows, on the other hand, we could getaround it by agreeing that Triggers are ignored for BATCH.

As for the question with getProcessingTime(). From my point of view, it
would be safe to simply return the current system time. I cannot think
of any dangers if we do so. Moreover, frankly speaking I am not entirely
sure what is the purpose of the method, other than injecting a clock in
tests of built-in operators. Maybe it was a mistake to expose it in the
user's API?

I agree, it was a mistake to expose getProcessingTime(). And I alsothink the same about getCurrentWatermark(), but that's neither here northere. 😅 I then also agree to just return the current time, as yousaid. I will change the FLIP for this.


Aljoscha

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

Reply via email to