There is no guarantee that every checkpoint will result in a committed callback. The app master periodically calculates the committed window based on the recently received checkpointed windows and communicates it back to the containers. In a scenario where two subsequent checkpoints where received before a recovery checkpoint update, only the second checkpoint will be notified as committed.
On Thu, Dec 3, 2015 at 5:01 PM, Ashwin Chandra Putta < [email protected]> wrote: > The stram app master determines the committed id in a heartbeat loop that > executes once every 1000 ms. > > Regards, > Ashwin. > > On Thu, Dec 3, 2015 at 4:52 PM, Ashwin Chandra Putta < > [email protected]> wrote: > > > I think the committed callback is not guaranteed. For a given operator, > > the window id we get in committed call back is a subset of window ids we > > get in checkpointed callback. > > > > The committed callback is usually called for all checkpointed window ids > > but when a given partition is blocked in between for some reason (say, > disk > > I/O or external system I/O) for sometime, it might be waiting in the > window > > (say window id 25) for some time while the other partitions have passed > > through a checkpoint(say checkpoint id 30). When this partition gets > > unblocked, it might quickly catch up to latest window (say 62, doing > > checkpoints at 30 and 60). Meanwhile, other partitions have checkpointed > > and sent id 60. The app master when deciding on the committed window id, > it > > looks through all the latest checkpoint window ids it got and at that > time > > it is possible that the latest checkpoint window for all the operators is > > window id 60. So it sends the committed window id as 60 and not 30. > > > > Please correct me if my understanding is wrong. > > > > Regards, > > Ashwin. > > > > On Thu, Dec 3, 2015 at 3:53 PM, Siyuan Hua <[email protected]> > wrote: > > > >> As of my knowledge, stram only knows window k is fully complete until > all > >> operator(s) have checkpointed window k. So I can say I'm guaranteed to > see > >> same sequence of numbers in checkpointed and committed. Is this > >> assumption > >> true? > >> > >> Regards, > >> Siyuan > >> > > > > > > > > -- > > > > Regards, > > Ashwin. > > > > > > -- > > Regards, > Ashwin. >
