How about re-launching the app from the same location?

If at all they want to store the state we can provide savepoint feature.

On Tue, Sep 20, 2016 at 4:39 AM Tushar Gosavi <tus...@datatorrent.com>
wrote:

> We have observed that application relaunch takes long time.
> The one major reason for delay in application startup during relaunch
> is time taken to copy state of exisitng application to new application.
> This state could grow in GBs and copy is performed in single thread before
> new application is submitted to Yarn.
>
> The state of previous application constists
> - jars
> - stram checkpoint/recovery file.
> - events
> - container file
> - stats recording if enabled.
> - operator checkpoints
> - operator data.
>
> We could avoid copying debugging data like stat recording which could
> run in TB for long
> running application and is not required for functioning of new application.
>
> Similarly operator checkpoints could be read in parallel when they are
> launched for first time,
> This will also help in copying only required checkpoints and will be
> done in parallel
> by multiple containers/threads.
>
> For operator data stored in application directory, we could copy it
> completely for now, but
> in future we could provide an callback which will allow operator
> partition to read only
> required state from previous location.
>
> let me know your though on this.
>
> Regards,
> - Tushar.
>

Reply via email to