The idea here was to create, on demand, recovery/committed window. But
there is always one(except before the first) recovery window for the DAG.
Instead of using/modifying the Checkpoint tuple, I am planning to reuse
the existing recovery window state, which simplifies the implementation.

Proposed API:

ApexCli> savepoint <appId> <folderToSaveTheState>
ApexCli> launch -savepoint <folderWithTheState>

first prototype:
https://github.com/sandeshh/apex-core/commit/8ec7e837318c2b33289251cda78ece0024a3f895

Thanks

On Thu, Aug 4, 2016 at 11:54 AM Amol Kekre <a...@datatorrent.com> wrote:

> hmm! actually it may be a good debugging tool too. Keep the named
> checkpoints around. The feature is to keep checkpoints around, which can be
> done by giving a feature to not delete checkpoints, but then naming them
> makes it more operational. Send a command from cli->get checkpoint -> know
> it is the one you need as the file name has your string you send with the
> command -> debug. This is different that querying a state as this gives
> entire app checkpoint to debug with.
>
> Thks
> Amol
>
>
> On Thu, Aug 4, 2016 at 11:41 AM, Venkatesh Kottapalli <
> venkat...@datatorrent.com> wrote:
>
> > + 1 for the idea.
> >
> > It might be helpful to developers as well when dealing with variety of
> > data in large volumes if this can help them run from the checkpointed
> state
> > rather than rerunning the application altogether in case of issues.
> >
> > I have seen cases where the application runs for more than 10 hours and
> > some partitions fail because of the variety of data that it is dealing
> > with. In such cases, the application has to be restarted and it will be
> > helpful to developers with a feature of this kind.
> >
> >  The ease of enabling/disabling this feature to run the app will also be
> > important.
> >
> > -Venkatesh.
> >
> >
> > > On Aug 4, 2016, at 10:29 AM, Amol Kekre <a...@datatorrent.com> wrote:
> > >
> > > We had an user who wanted roll-back and restart from audit purposes.
> That
> > > time we did not have timed-window. Names checkpoint would have helped a
> > > little bit..
> > >
> > > Problem statement: Auditors ask for rerun of yesterday's computations
> for
> > > verification. Assume that these computations depend on previous state
> > (i.e
> > > data from day before yesterday).
> > >
> > > Solution
> > > 1. Have named checkpoints at 12 in the night (an input adapter triggers
> > it)
> > > every day
> > > 2. The app spools raw logs into hdfs along with window ids and event
> > times
> > > 3. The re-run is a separate app that starts off on a named checkpoint
> (12
> > > night yesterday)
> > >
> > > Technically the solution will not as simple and "new audit app" will
> > need a
> > > lot of other checks (dedups, drop events not in yesterday's window,
> wait
> > > for late arrivals, ...), but names checkpoint helps.
> > >
> > > I do agree with Pramod's that replay within the same running app is not
> > > viable within a data-in-motion architecture. But it helps somewhat in a
> > new
> > > audit app. Named checkpoints help data-in-motion architectures handle
> > batch
> > > apps better. In the above case #2 spooling done with event time
> > stamp+state
> > > suffices. The state part comes from names checkpoint.
> > >
> > > Thks,
> > > Amol
> > >
> > >
> > >
> > >
> > > On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <san...@datatorrent.com
> >
> > > wrote:
> > >
> > >> I agree. A specific use-case will be useful to support this feature.
> > Also
> > >> the ability to replay from the named checkpoint will be limited
> because
> > of
> > >> various factors, isn’t it?
> > >>
> > >> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pra...@datatorrent.com> wrote:
> > >>
> > >>    There is a problem here, keeping old checkpoints and recovering
> from
> > >> them
> > >>    means preserving the old input data along with the state. This is
> > more
> > >> than
> > >>    the mechanism of actually creating named checkpoints, it means
> having
> > >> the
> > >>    ability for operators to move forward (a.k.a committed and dropping
> > >>    committed states and buffer data) while still having the ability to
> > >> replay
> > >>    from that point from the input source and providing a way for
> > >> operators (at
> > >>    first look input operators) to distinguish that. Why would someone
> > need
> > >>    this with idempotent processing? Is there a specific use case you
> are
> > >>    looking at? Suppose we go do this, for the mechanism, I would be in
> > >> favor
> > >>    of reusing existing tuple.
> > >>
> > >>    On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <
> v.ro...@datatorrent.com>
> > >> wrote:
> > >>
> > >>> +1 for the feature. At first look I am more in favor of reusing
> > >> existing
> > >>> control tuple.
> > >>>
> > >>> Thank you,
> > >>>
> > >>> Vlad
> > >>>
> > >>>
> > >>> On 8/4/16 08:17, Sandesh Hegde wrote:
> > >>>
> > >>>> @Chinmay
> > >>>> We can enhance the existing checkpoint tuple but that one is more
> > >>>> frequently used than this feature, so why burden Checkpoint tuple
> > >> with
> > >>>> an extra field?
> > >>>>
> > >>>> @Aniruddha
> > >>>> It is better to leave the scheduling to the users, they can use any
> > >> tool
> > >>>> that they are already familiar with.
> > >>>>
> > >>>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
> > >>>> anirud...@datatorrent.com>
> > >>>> wrote:
> > >>>>
> > >>>> +1 On the idea, it would be awesome to have.
> > >>>>>
> > >>>>> Question: Can we further develop this brilliant idea into:-
> > >>>>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
> > >>>>> This would be on the lines of logrotate / general backup
> > >> strategies.
> > >>>>>
> > >>>>>
> > >>>>> Thanks,
> > >>>>>
> > >>>>> A
> > >>>>>
> > >>>>> _____________________________________
> > >>>>> Sent with difficulty, I mean handheld ;)
> > >>>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <r...@datatorrent.com>
> > >> wrote:
> > >>>>>
> > >>>>> +1
> > >>>>>>
> > >>>>>> Ram
> > >>>>>>
> > >>>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
> > >> sand...@datatorrent.com
> > >>>>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> Hello Team,
> > >>>>>>>
> > >>>>>>> This thread is to discuss the Named Checkpoint feature for Apex.
> > >> (
> > >>>>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
> > >>>>>>>
> > >>>>>>> Named checkpoints allow following workflow,
> > >>>>>>>
> > >>>>>>> 1. Users can trigger a checkpoint and give it a name
> > >>>>>>> 2. Relaunch the application from the named checkpoint.
> > >>>>>>> 3. These checkpoints survive the "purge of old checkpoints".
> > >>>>>>>
> > >>>>>>> Current idea is to add a new control tuple,
> > >> NamedCheckPointTuple, which
> > >>>>>>> contains the user specified name, it traverses the DAG and along
> > >> the
> > >>>>>>>
> > >>>>>> way
> > >>>>>
> > >>>>>> necessary actions are taken.
> > >>>>>>>
> > >>>>>>> Please let me know your thoughts on this.
> > >>>>>>>
> > >>>>>>> Thanks
> > >>>>>>>
> > >>>>>>>
> > >>>
> > >>
> > >>
> > >>
> > >>
> >
> >
>

Reply via email to