[jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint

ASF GitHub Bot (JIRA) Mon, 04 Jul 2016 04:38:55 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361196#comment-15361196
 ]


ASF GitHub Bot commented on FLINK-3397:
---------------------------------------

GitHub user ramkrish86 opened a pull request:

    https://github.com/apache/flink/pull/2195

    FLINK-3397 Failed streaming jobs should fall back to the most recent

    Initial patch to see if this is what is intended out of the JIRA. Thought a 
PR could help me in getting a better feedback. I tried to tweak and add a test 
case but I could not. I followed what was done in SavePointITCase and 
particularly testRestoreFailure(). But am not able to get a flow where there 
could be a checkpoint and also a save point because this test case allows the 
notification to happen when the job is removed and that clears all the existing 
savePoints. So when the test case restores it always goes with the savePoint. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ramkrish86/flink FLINK-3397

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2195
    
----
commit 70e881fba6ab1964600b4fc932a8f7b683e2ff1e
Author: Ramkrishna <[email protected]>
Date:   2016-07-04T11:32:11Z

    FLINK-3397 Failed streaming jobs should fall back to the most recent
    checkpoint/savepoint (Ram)

----


> Failed streaming jobs should fall back to the most recent checkpoint/savepoint
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-3397
>                 URL: https://issues.apache.org/jira/browse/FLINK-3397
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Gyula Fora
>            Priority: Minor
>
> The current fallback behaviour in case of a streaming job failure is slightly 
> counterintuitive:
> If a job fails it will fall back to the most recent checkpoint (if any) even 
> if there were more recent savepoint taken. This means that savepoints are not 
> regarded as checkpoints by the system only points from where a job can be 
> manually restarted.
> I suggest to change this so that savepoints are also regarded as checkpoints 
> in case of a failure and they will also be used to automatically restore the 
> streaming job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint

Reply via email to