Pramod, Doing an ad-hoc checkpoint may be a possibility. Amol
On Fri, Nov 13, 2015 at 9:57 AM, Pramod Immaneni <[email protected]> wrote: > If checkpoint is a multiple of windows and end window tuples are already > flowing and triggering end windows on the operators is there additional > knowledge being gained by a checkpoint tuple. I can see one advantage that > you can force a checkpoint throughout the system adhoc on a window if the > STRAM decides. > > Chetan can you give me an example of where the operator checkpoint at a > multiple greater than the application checkpoint would be used. I would > think something like operator wanting to set their own checkpoint interval > as an absolute unrelated to another checkpointing mechanism would be more > useful. > > On Fri, Nov 13, 2015 at 9:03 AM, Amol Kekre <[email protected]> wrote: > > > There is an additional impact of using checkpoint tuple as opposed to > each > > StramChild simply checkpointing at pre-known windows. This is the > knowledge > > of checkpoint flow as per Chetan's #1. Stram will know that the checpoint > > tuple has passed through all upstream operators. In non-blocking > > checkpoints (default) this may not be as critical, but for blocking > > checkpoints it may be important. Plus the logic to > > re-construct/re-partition does become a lot simpler with this knowledge. > > > > Getting my memory back, after Chetan's email :) the trigger thought to > move > > to checkpoint tuple was the ease of aligning checkpoints aka get a clear > > application-wide state as Chetan stated. Technically hard coding these > > numbers in each StramChild (per operator) may work, but checkpoint tuple > > made it easy and Stram could then leverage this as knowledge. Another > path > > down the memory - I was pushing for heartbeat control tuple(s) whereever > we > > can. These are tuples that flow through dataflow and report back some > > content from which application condiition/dataflow aspects can be > derived. > > These are needed for a non-blocking master to function. A very critical > > part for operabilty we used in past attempts are distributed > data-in-motion > > architecturees. Control tuple solved that purpose from checkpointing > > triggers point of view. WindowId control tuples solved that via dataflow > > point of view. > > > > Thks, > > Amol > > > > > > On Thu, Nov 12, 2015 at 9:07 PM, Chetan Narsude (cnarsude) < > > [email protected]> wrote: > > > > > Pramod, the previous design was to checkpoint at random window ids. The > > > issue with that was that repartitioning/recovery could be impossible in > > > certain cases if all the partitions did not checkpoint at the same > > window. > > > This is the new design with the control tuple although > > > checkpoint_window_count was added later to let the operators delay > their > > > checkpoint to a later window than the time when they would normally > > > checkpoint with the control tuple. We did not want them to be able to > do > > > the checkpoint earlier than scheduled one as that decision would be > > > centrally controlled via application. Useful where the operator > > attributes > > > are allowed to be configured independent of the application attributes. > > > It¹s also documented with the OperatorContext.CHECKPOINT_WINDOW_COUNT > > > > > > /** > > > * Attribute of the operator that hints at the optimal checkpoint > > > boundary. > > > * By default checkpointing happens after every predetermined > > > streaming windows. Application developer can override > > > * this behavior by defining the following attribute. When this > > > attribute is defined, checkpointing will be done after > > > * completion of later of regular checkpointing window and the > window > > > whose serial number is divisible by the attribute > > > * value. Typically user would define this value to be the same as > > > that of APPLICATION_WINDOW_COUNT so checkpointing > > > * will be done at application window boundary. > > > */ > > > Attribute<Integer> CHECKPOINT_WINDOW_COUNT = new > > Attribute<Integer>(1); > > > > > > > > > > > > Besides this design based on the requirement: > > > 1. Checkpointing tuple staggers the checkpoints amongst multiple > stages. > > > It does not trigger checkpoint operation unless upstream operator is > done > > > checkpointing. This often results in better resource utilization with > > > different resources in different configurations. > > > 2. Checkpoint tuple helps with resetting the state of the stateful > stream > > > codecs. > > > > > > Tim, > > > > > > The reason for double checkpoint appears to be a bug where the > > > lastCheckpointWindowId is not set after checkpoint in the endWindow. > The > > > condition in ŒCHECKPOINT:¹ case was added to avoid double checkpoints. > > Can > > > you confirm? > > > > > > ‹ > > > Chetan > > > > > > > > > > > > > > > On 11/12/15, 6:07 PM, "Amol Kekre" <[email protected]> wrote: > > > > > > >I am trying to recollect too. I do remember Chetan, Thomas, and I > going > > > >deep on this choice. One issue was the efficiency of current setup. > Only > > > >the inputAdapters had to insert control tuple, all other operators > were > > as > > > >is. I will try to recollect other details. or maybe Chetan or Thomas > can > > > >comment. > > > > > > > >Thks, > > > >Amol > > > > > > > > > > > >On Thu, Nov 12, 2015 at 5:53 PM, Pramod Immaneni < > > [email protected]> > > > >wrote: > > > > > > > >> From what I am seeing so far (when implementing APEX-246) it is a > left > > > >>over > > > >> from an earlier implementation but I am not completely sure yet. > > > >> > > > >> On Thu, Nov 12, 2015 at 5:43 PM, Timothy Farkas < > [email protected]> > > > >> wrote: > > > >> > > > >> > After stumbling on https://malhar.atlassian.net/browse/APEX-263 I > > am > > > >> > wondering what the purpose of the CHECKPOINT control tuple is? Why > > is > > > >>it > > > >> > not sufficient to have each operator checkpoint after it's > > checkpoint > > > >> > window has passed? > > > >> > > > > >> > Thanks, > > > >> > Tim > > > >> > > > > >> > > > > > > > > >
