The idea here was to create, on demand, recovery/committed window. But there is always one(except before the first) recovery window for the DAG. Instead of using/modifying the Checkpoint tuple, I am planning to reuse the existing recovery window state, which simplifies the implementation.
Proposed API: ApexCli> savepoint <appId> <folderToSaveTheState> ApexCli> launch -savepoint <folderWithTheState> first prototype: https://github.com/sandeshh/apex-core/commit/8ec7e837318c2b33289251cda78ece0024a3f895 Thanks On Thu, Aug 4, 2016 at 11:54 AM Amol Kekre <a...@datatorrent.com> wrote: > hmm! actually it may be a good debugging tool too. Keep the named > checkpoints around. The feature is to keep checkpoints around, which can be > done by giving a feature to not delete checkpoints, but then naming them > makes it more operational. Send a command from cli->get checkpoint -> know > it is the one you need as the file name has your string you send with the > command -> debug. This is different that querying a state as this gives > entire app checkpoint to debug with. > > Thks > Amol > > > On Thu, Aug 4, 2016 at 11:41 AM, Venkatesh Kottapalli < > venkat...@datatorrent.com> wrote: > > > + 1 for the idea. > > > > It might be helpful to developers as well when dealing with variety of > > data in large volumes if this can help them run from the checkpointed > state > > rather than rerunning the application altogether in case of issues. > > > > I have seen cases where the application runs for more than 10 hours and > > some partitions fail because of the variety of data that it is dealing > > with. In such cases, the application has to be restarted and it will be > > helpful to developers with a feature of this kind. > > > > The ease of enabling/disabling this feature to run the app will also be > > important. > > > > -Venkatesh. > > > > > > > On Aug 4, 2016, at 10:29 AM, Amol Kekre <a...@datatorrent.com> wrote: > > > > > > We had an user who wanted roll-back and restart from audit purposes. > That > > > time we did not have timed-window. Names checkpoint would have helped a > > > little bit.. > > > > > > Problem statement: Auditors ask for rerun of yesterday's computations > for > > > verification. Assume that these computations depend on previous state > > (i.e > > > data from day before yesterday). > > > > > > Solution > > > 1. Have named checkpoints at 12 in the night (an input adapter triggers > > it) > > > every day > > > 2. The app spools raw logs into hdfs along with window ids and event > > times > > > 3. The re-run is a separate app that starts off on a named checkpoint > (12 > > > night yesterday) > > > > > > Technically the solution will not as simple and "new audit app" will > > need a > > > lot of other checks (dedups, drop events not in yesterday's window, > wait > > > for late arrivals, ...), but names checkpoint helps. > > > > > > I do agree with Pramod's that replay within the same running app is not > > > viable within a data-in-motion architecture. But it helps somewhat in a > > new > > > audit app. Named checkpoints help data-in-motion architectures handle > > batch > > > apps better. In the above case #2 spooling done with event time > > stamp+state > > > suffices. The state part comes from names checkpoint. > > > > > > Thks, > > > Amol > > > > > > > > > > > > > > > On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <san...@datatorrent.com > > > > > wrote: > > > > > >> I agree. A specific use-case will be useful to support this feature. > > Also > > >> the ability to replay from the named checkpoint will be limited > because > > of > > >> various factors, isn’t it? > > >> > > >> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pra...@datatorrent.com> wrote: > > >> > > >> There is a problem here, keeping old checkpoints and recovering > from > > >> them > > >> means preserving the old input data along with the state. This is > > more > > >> than > > >> the mechanism of actually creating named checkpoints, it means > having > > >> the > > >> ability for operators to move forward (a.k.a committed and dropping > > >> committed states and buffer data) while still having the ability to > > >> replay > > >> from that point from the input source and providing a way for > > >> operators (at > > >> first look input operators) to distinguish that. Why would someone > > need > > >> this with idempotent processing? Is there a specific use case you > are > > >> looking at? Suppose we go do this, for the mechanism, I would be in > > >> favor > > >> of reusing existing tuple. > > >> > > >> On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov < > v.ro...@datatorrent.com> > > >> wrote: > > >> > > >>> +1 for the feature. At first look I am more in favor of reusing > > >> existing > > >>> control tuple. > > >>> > > >>> Thank you, > > >>> > > >>> Vlad > > >>> > > >>> > > >>> On 8/4/16 08:17, Sandesh Hegde wrote: > > >>> > > >>>> @Chinmay > > >>>> We can enhance the existing checkpoint tuple but that one is more > > >>>> frequently used than this feature, so why burden Checkpoint tuple > > >> with > > >>>> an extra field? > > >>>> > > >>>> @Aniruddha > > >>>> It is better to leave the scheduling to the users, they can use any > > >> tool > > >>>> that they are already familiar with. > > >>>> > > >>>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare < > > >>>> anirud...@datatorrent.com> > > >>>> wrote: > > >>>> > > >>>> +1 On the idea, it would be awesome to have. > > >>>>> > > >>>>> Question: Can we further develop this brilliant idea into:- > > >>>>> Scheduled checkpoints ( To save as dynamically named checkpoint)? > > >>>>> This would be on the lines of logrotate / general backup > > >> strategies. > > >>>>> > > >>>>> > > >>>>> Thanks, > > >>>>> > > >>>>> A > > >>>>> > > >>>>> _____________________________________ > > >>>>> Sent with difficulty, I mean handheld ;) > > >>>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <r...@datatorrent.com> > > >> wrote: > > >>>>> > > >>>>> +1 > > >>>>>> > > >>>>>> Ram > > >>>>>> > > >>>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde < > > >> sand...@datatorrent.com > > >>>>>>> > > >>>>>> wrote: > > >>>>>> > > >>>>>> Hello Team, > > >>>>>>> > > >>>>>>> This thread is to discuss the Named Checkpoint feature for Apex. > > >> ( > > >>>>>>> https://issues.apache.org/jira/browse/APEXCORE-498) > > >>>>>>> > > >>>>>>> Named checkpoints allow following workflow, > > >>>>>>> > > >>>>>>> 1. Users can trigger a checkpoint and give it a name > > >>>>>>> 2. Relaunch the application from the named checkpoint. > > >>>>>>> 3. These checkpoints survive the "purge of old checkpoints". > > >>>>>>> > > >>>>>>> Current idea is to add a new control tuple, > > >> NamedCheckPointTuple, which > > >>>>>>> contains the user specified name, it traverses the DAG and along > > >> the > > >>>>>>> > > >>>>>> way > > >>>>> > > >>>>>> necessary actions are taken. > > >>>>>>> > > >>>>>>> Please let me know your thoughts on this. > > >>>>>>> > > >>>>>>> Thanks > > >>>>>>> > > >>>>>>> > > >>> > > >> > > >> > > >> > > >> > > > > >