hmm! actually it may be a good debugging tool too. Keep the named checkpoints around. The feature is to keep checkpoints around, which can be done by giving a feature to not delete checkpoints, but then naming them makes it more operational. Send a command from cli->get checkpoint -> know it is the one you need as the file name has your string you send with the command -> debug. This is different that querying a state as this gives entire app checkpoint to debug with.
Thks Amol On Thu, Aug 4, 2016 at 11:41 AM, Venkatesh Kottapalli < venkat...@datatorrent.com> wrote: > + 1 for the idea. > > It might be helpful to developers as well when dealing with variety of > data in large volumes if this can help them run from the checkpointed state > rather than rerunning the application altogether in case of issues. > > I have seen cases where the application runs for more than 10 hours and > some partitions fail because of the variety of data that it is dealing > with. In such cases, the application has to be restarted and it will be > helpful to developers with a feature of this kind. > > The ease of enabling/disabling this feature to run the app will also be > important. > > -Venkatesh. > > > > On Aug 4, 2016, at 10:29 AM, Amol Kekre <a...@datatorrent.com> wrote: > > > > We had an user who wanted roll-back and restart from audit purposes. That > > time we did not have timed-window. Names checkpoint would have helped a > > little bit.. > > > > Problem statement: Auditors ask for rerun of yesterday's computations for > > verification. Assume that these computations depend on previous state > (i.e > > data from day before yesterday). > > > > Solution > > 1. Have named checkpoints at 12 in the night (an input adapter triggers > it) > > every day > > 2. The app spools raw logs into hdfs along with window ids and event > times > > 3. The re-run is a separate app that starts off on a named checkpoint (12 > > night yesterday) > > > > Technically the solution will not as simple and "new audit app" will > need a > > lot of other checks (dedups, drop events not in yesterday's window, wait > > for late arrivals, ...), but names checkpoint helps. > > > > I do agree with Pramod's that replay within the same running app is not > > viable within a data-in-motion architecture. But it helps somewhat in a > new > > audit app. Named checkpoints help data-in-motion architectures handle > batch > > apps better. In the above case #2 spooling done with event time > stamp+state > > suffices. The state part comes from names checkpoint. > > > > Thks, > > Amol > > > > > > > > > > On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <san...@datatorrent.com> > > wrote: > > > >> I agree. A specific use-case will be useful to support this feature. > Also > >> the ability to replay from the named checkpoint will be limited because > of > >> various factors, isn’t it? > >> > >> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pra...@datatorrent.com> wrote: > >> > >> There is a problem here, keeping old checkpoints and recovering from > >> them > >> means preserving the old input data along with the state. This is > more > >> than > >> the mechanism of actually creating named checkpoints, it means having > >> the > >> ability for operators to move forward (a.k.a committed and dropping > >> committed states and buffer data) while still having the ability to > >> replay > >> from that point from the input source and providing a way for > >> operators (at > >> first look input operators) to distinguish that. Why would someone > need > >> this with idempotent processing? Is there a specific use case you are > >> looking at? Suppose we go do this, for the mechanism, I would be in > >> favor > >> of reusing existing tuple. > >> > >> On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <v.ro...@datatorrent.com> > >> wrote: > >> > >>> +1 for the feature. At first look I am more in favor of reusing > >> existing > >>> control tuple. > >>> > >>> Thank you, > >>> > >>> Vlad > >>> > >>> > >>> On 8/4/16 08:17, Sandesh Hegde wrote: > >>> > >>>> @Chinmay > >>>> We can enhance the existing checkpoint tuple but that one is more > >>>> frequently used than this feature, so why burden Checkpoint tuple > >> with > >>>> an extra field? > >>>> > >>>> @Aniruddha > >>>> It is better to leave the scheduling to the users, they can use any > >> tool > >>>> that they are already familiar with. > >>>> > >>>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare < > >>>> anirud...@datatorrent.com> > >>>> wrote: > >>>> > >>>> +1 On the idea, it would be awesome to have. > >>>>> > >>>>> Question: Can we further develop this brilliant idea into:- > >>>>> Scheduled checkpoints ( To save as dynamically named checkpoint)? > >>>>> This would be on the lines of logrotate / general backup > >> strategies. > >>>>> > >>>>> > >>>>> Thanks, > >>>>> > >>>>> A > >>>>> > >>>>> _____________________________________ > >>>>> Sent with difficulty, I mean handheld ;) > >>>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <r...@datatorrent.com> > >> wrote: > >>>>> > >>>>> +1 > >>>>>> > >>>>>> Ram > >>>>>> > >>>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde < > >> sand...@datatorrent.com > >>>>>>> > >>>>>> wrote: > >>>>>> > >>>>>> Hello Team, > >>>>>>> > >>>>>>> This thread is to discuss the Named Checkpoint feature for Apex. > >> ( > >>>>>>> https://issues.apache.org/jira/browse/APEXCORE-498) > >>>>>>> > >>>>>>> Named checkpoints allow following workflow, > >>>>>>> > >>>>>>> 1. Users can trigger a checkpoint and give it a name > >>>>>>> 2. Relaunch the application from the named checkpoint. > >>>>>>> 3. These checkpoints survive the "purge of old checkpoints". > >>>>>>> > >>>>>>> Current idea is to add a new control tuple, > >> NamedCheckPointTuple, which > >>>>>>> contains the user specified name, it traverses the DAG and along > >> the > >>>>>>> > >>>>>> way > >>>>> > >>>>>> necessary actions are taken. > >>>>>>> > >>>>>>> Please let me know your thoughts on this. > >>>>>>> > >>>>>>> Thanks > >>>>>>> > >>>>>>> > >>> > >> > >> > >> > >> > >