[
https://issues.apache.org/jira/browse/FLINK-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059962#comment-15059962
]
ASF GitHub Bot commented on FLINK-3050:
---------------------------------------
GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/1461
[FLINK-3050] [runtime] Add UnrecoverableException to suppress job restarts
I need this to address a comment in #1434.
Adds `UnrecoverableException`, which suppresses job restarts if it is the
failure cause. It's just a wrapper around the real cause and it is only
possible to instantiate with a cause.
A stack trace looks like this:
```java
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:649)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:595)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:595)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.execution.UnrecoverableException:
Unrecoverable failure. This suppresses job restarts. Please check the stack
trace for the root cause.
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1067)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1052)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1052)
... 9 more
Caused by: java.lang.IllegalArgumentException: Invalid path 'unknown path'.
at
org.apache.flink.runtime.checkpoint.HeapStateStore.getState(HeapStateStore.java:57)
at
org.apache.flink.runtime.checkpoint.SavepointStore.getState(SavepointStore.java:54)
at
org.apache.flink.runtime.checkpoint.SavepointStore.getState(SavepointStore.java:24)
at
org.apache.flink.runtime.checkpoint.SavepointCoordinator.restoreSavepoint(SavepointCoordinator.java:189)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.restoreSavepoint(ExecutionGraph.java:874)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1064)
... 11 more
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink 3050-suppress_restarts
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1461.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1461
----
commit 952793e532cc20f33e97e7692e35ea508e715f1e
Author: Ufuk Celebi <[email protected]>
Date: 2015-12-16T13:09:22Z
[FLINK-3050] [runtime] Add UnrecoverableException to suppress job restarts
----
> Add custom Exception type to suppress job restarts
> --------------------------------------------------
>
> Key: FLINK-3050
> URL: https://issues.apache.org/jira/browse/FLINK-3050
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Runtime
> Affects Versions: 0.10.0
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
> Priority: Minor
> Fix For: 1.0.0
>
>
> In case of failures and configured execution retries, the job will be be
> restarted even in cases when the failure is not recoverable.
> We can add a custom Exception type like UnrecoverableFailure in order to
> suppress restarts in certain cases. The execution graph restart logic can
> check the failure type on recovery and skip the restarting.
> This Exception can be used both by the system and the user.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)