[ 
https://issues.apache.org/jira/browse/FLINK-17765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-17765:
----------------------------------------

    Assignee: Chesnay Schepler

> Verbose client error messages
> -----------------------------
>
>                 Key: FLINK-17765
>                 URL: https://issues.apache.org/jira/browse/FLINK-17765
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission, Runtime / Coordination, Runtime 
> / REST
>    Affects Versions: 1.10.1, 1.11.0
>            Reporter: Till Rohrmann
>            Assignee: Chesnay Schepler
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> Some client operations if they fail produce very verbose error messages which 
> are hard to decipher for the user. For example, if the job submission fails 
> because the savepoint path does not exist, then the user sees the following:
> {code}
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
> JobGraph.
>         at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:302)
>         at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198)
>         at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:148)
>         at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:689)
>         at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:227)
>         at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:906)
>         at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:982)
>         at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>         at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:982)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
> JobGraph.
>         at 
> org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:290)
>         at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1766)
>         at 
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:104)
>         at 
> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:71)
>         at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1645)
>         at 
> org.apache.flink.streaming.examples.statemachine.StateMachineExample.main(StateMachineExample.java:142)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:288)
>         ... 8 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
> JobGraph.
>         at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>         at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1761)
>         ... 17 more
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
> submit JobGraph.
>         at 
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:366)
>         at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>         at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>         at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$8(FutureUtils.java:290)
>         at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>         at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>         at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at 
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
>         at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
>         at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal 
> server error., <Exception on server side:
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
>         at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$internalSubmitJob$3(Dispatcher.java:343)
>         at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>         at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>         at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>         at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>         at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>         at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>         at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         ... 6 more
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not 
> set up JobManager
>         at 
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:149)
>         at 
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84)
>         at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:386)
>         at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
>         ... 7 more
> Caused by: java.io.FileNotFoundException: Cannot find checkpoint or savepoint 
> file/directory 'file:///does/not/exist' on file system 'file'.
>         at 
> org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:243)
>         at 
> org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:110)
>         at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1300)
>         at 
> org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:299)
>         at 
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:252)
>         at 
> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:228)
>         at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119)
>         at 
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103)
>         at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284)
>         at 
> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272)
>         at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
>         at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
>         at 
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:143)
>         ... 10 more
> End of exception on server side>]
>         at 
> org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:390)
>         at 
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:374)
>         at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
>         at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>         ... 4 more
> {code}
> We might be able to remove a lot of the clutter by introducing a special 
> exception which represents an exception which should be reported back to the 
> user and which is then not rewrapped as it bubbles up the call stack.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to