I see. Is there a missing watermark hold for timers less then T2? On Mon, Jun 10, 2019 at 9:08 AM Jan Lukavský <[email protected]> wrote:
> Yes, there is no difference between GC and user timers in this case. I > think the problem is simply that when watermark moves from time T1 to T2, > DirectRunner fires all timers that fire until T2, but that can create new > timers for time between T1 and T2, and these will be fired later, although > should have been fired before T2. > > Jan > On 6/10/19 5:48 PM, Kenneth Knowles wrote: > > Reading your Jira, I believe this problem will manifest without the > interaction of user timers and GC. Interesting case. It surrounds whether a > runner makes a timer available or fires it prior to the bundle being > committed. > > I have commented elsewhere about this part, quoting the Jira: > > > have experimented with this a little and have not yet figured out what > the correct solution should be. What I tried: > > 1) hold input watermark for min(setup timers) > > 2) fire timers based not on input watermark, but on output watermark > (output watermark is held by min timer stamp) > > Neither of these quite works. What we need is a separate "element input > watermark" and "timer input watermark". The overall input watermark that > drives GC is the min of these. The output watermark is also held to this > overall input watermark. User timers fire according to the element input > watermark. > > Kenn > > On Mon, Jun 10, 2019 at 8:44 AM Lukasz Cwik <[email protected]> wrote: > >> Jan are you editing the implementation of how timers work within the >> DirectRunner or are trying to build support for time sorted input on top of >> the Beam model for timers? >> Because I think you will need to do the former. >> >> On Mon, Jun 10, 2019 at 8:41 AM Jan Lukavský <[email protected]> wrote: >> >>> Hm, that would probably work, thanks! >>> >>> But, should the timers behave like that? I'm trying to fix tris by >>> introducing a sequence of watermarks >>> >>> inputs watermark -> timer watermark -> output watermark >>> >>> as suggested in the JIRA, and it actually seems to be working as >>> expected. It even cleans some code paths, but I'm debugging some strange >>> behavior this exposed - >>> `WatermarkHold.watermarkHoldTagForTimestampCombiner` seems to have stopped >>> clearing itself after this change and some Pipelines therefore stopped >>> working. I'm little lost why this happened. I can push code I have if >>> anyone interested. >>> >>> Jan >>> On 6/10/19 5:32 PM, Lukasz Cwik wrote: >>> >>> We hit an instance of this problem before and solved it rescheduling the >>> GC timer again if there was a conflicting timer that was also meant to fire. >>> >>> On Mon, Jun 10, 2019 at 8:17 AM Jan Lukavský <[email protected]> wrote: >>> >>>> For a single key. I'm getting into collision of timerId >>>> `__StatefulParDoGcTimerId` (StatefulDoFnRunner) and my timerId for flushing >>>> sorted elements in implementation of @RequiresTimeSortedInput. The timers >>>> are being swapped at the end of input (but it can happen anywhere near end >>>> of window), which results in state being cleared before it gets flushed, >>>> which means data loss. >>>> >>>> Jan >>>> On 6/10/19 5:08 PM, Reuven Lax wrote: >>>> >>>> Do you mean for a single key or across keys? >>>> >>>> On Mon, Jun 10, 2019, 5:11 AM Jan Lukavský <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have come across issue [1], where I'm not sure how to solve this in >>>>> most elegant way. >>>>> >>>>> Any suggestions? >>>>> >>>>> Thanks, >>>>> >>>>> Jan >>>>> >>>>> [1] https://issues.apache.org/jira/browse/BEAM-7520 >>>>> >>>>>
