[ https://issues.apache.org/jira/browse/FLINK-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623261#comment-16623261 ]
ASF GitHub Bot commented on FLINK-10312: ---------------------------------------- azagrebin opened a new pull request #6731: [FLINK-10312] Propagate exception from server to client in REST API URL: https://github.com/apache/flink/pull/6731 ## What is the purpose of the change If exception currently happens on the server side in REST API handlers, the client side gets error response. The error response contains only an abstract message of the server side exception in the error list and no details. This PR adds also a stringified version of the stack trace of the server side exception. This way the stack trace in logs on the client side will also contain the stack trace of the server side with more details about actual failure. ## Brief change log - pack the stringified version of the exception stack trace into the error list of ErrorResponseBody in AbstractRestHandler.processRestHandlerException - strip CompletionException in completeExceptionally case of FutureUtils.retryOperationWithDelay to avoid double logging of underlying exception ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Wrong / missing exception when submitting job > --------------------------------------------- > > Key: FLINK-10312 > URL: https://issues.apache.org/jira/browse/FLINK-10312 > Project: Flink > Issue Type: Bug > Components: JobManager > Affects Versions: 1.5.2, 1.6.0 > Reporter: Stephan Ewen > Assignee: Andrey Zagrebin > Priority: Critical > Labels: pull-request-available > Fix For: 1.7.0, 1.6.2, 1.5.5 > > Attachments: lmerge-TR.pdf > > > h3. Problem > When submitting a job that cannot be created / initialized on the JobManager, > there is no proper error message. The exception says *"Could not retrieve the > execution result. (JobID: 5a7165e1260c6316fa11d2760bd3d49f)"* > h3. Steps to Reproduce > Create a streaming job, set a state backend with a non existing file system > scheme. > h3. Full Stack Trace > {code} > Submitting a job where instantiation on the JM fails yields this, which seems > like a major regression from seeing the actual exception: > org.apache.flink.client.program.ProgramInvocationException: Could not > retrieve the execution result. (JobID: 5a7165e1260c6316fa11d2760bd3d49f) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:260) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486) > at > org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1511) > at > com.dataartisans.streamledger.examples.simpletrade.SimpleTradeExample.main(SimpleTradeExample.java:98) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:426) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$16(CliFrontend.java:1120) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120) > Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to > submit JobGraph. > at > org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$25(RestClusterClient.java:379) > at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$32(FutureUtils.java:213) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.CompletionException: > org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not > complete the operation. Exception is not retryable. > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326) > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338) > at > java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911) > at > java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899) > ... 12 more > Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: > Could not complete the operation. Exception is not retryable. > ... 10 more > Caused by: java.util.concurrent.CompletionException: > org.apache.flink.runtime.rest.util.RestClientException: [Job submission > failed.] > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326) > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338) > at > java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911) > at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953) > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) > ... 4 more > Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job > submission failed.] > at > org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310) > at > org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$364(RestClient.java:294) > at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)