[jira] [Commented] (FLINK-3050) Add custom Exception type to suppress job restarts

ASF GitHub Bot (JIRA) Wed, 16 Dec 2015 05:17:43 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059962#comment-15059962
 ]


ASF GitHub Bot commented on FLINK-3050:
---------------------------------------

GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/1461

    [FLINK-3050] [runtime] Add UnrecoverableException to suppress job restarts

    I need this to address a comment in #1434.
    
    Adds `UnrecoverableException`, which suppresses job restarts if it is the 
failure cause. It's just a wrapper around the real cause and it is only 
possible to instantiate with a cause.
    
    A stack trace looks like this:
    
    ```java
    org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:649)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:595)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:595)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: org.apache.flink.runtime.execution.UnrecoverableException: 
Unrecoverable failure. This suppresses job restarts. Please check the stack 
trace for the root cause.
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1067)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1052)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1052)
        ... 9 more
    Caused by: java.lang.IllegalArgumentException: Invalid path 'unknown path'.
        at 
org.apache.flink.runtime.checkpoint.HeapStateStore.getState(HeapStateStore.java:57)
        at 
org.apache.flink.runtime.checkpoint.SavepointStore.getState(SavepointStore.java:54)
        at 
org.apache.flink.runtime.checkpoint.SavepointStore.getState(SavepointStore.java:24)
        at 
org.apache.flink.runtime.checkpoint.SavepointCoordinator.restoreSavepoint(SavepointCoordinator.java:189)
        at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.restoreSavepoint(ExecutionGraph.java:874)
        at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1064)
        ... 11 more
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/flink 3050-suppress_restarts

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1461
    
----
commit 952793e532cc20f33e97e7692e35ea508e715f1e
Author: Ufuk Celebi <[email protected]>
Date:   2015-12-16T13:09:22Z

    [FLINK-3050] [runtime] Add UnrecoverableException to suppress job restarts

----


> Add custom Exception type to suppress job restarts
> --------------------------------------------------
>
>                 Key: FLINK-3050
>                 URL: https://issues.apache.org/jira/browse/FLINK-3050
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>    Affects Versions: 0.10.0
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>            Priority: Minor
>             Fix For: 1.0.0
>
>
> In case of failures and configured execution retries, the job will be be 
> restarted even in cases when the failure is not recoverable.
> We can add a custom Exception type like UnrecoverableFailure in order to 
> suppress restarts in certain cases. The execution graph restart logic can 
> check the failure type on recovery and skip the restarting.
> This Exception can be used both by the system and the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3050) Add custom Exception type to suppress job restarts

Reply via email to