[ 
https://issues.apache.org/jira/browse/FLINK-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623261#comment-16623261
 ] 

ASF GitHub Bot commented on FLINK-10312:
----------------------------------------

azagrebin opened a new pull request #6731: [FLINK-10312] Propagate exception 
from server to client in REST API
URL: https://github.com/apache/flink/pull/6731
 
 
   ## What is the purpose of the change
   
   If exception currently happens on the server side in REST API handlers, the 
client side gets error response. The error response contains only an abstract 
message of the server side exception in the error list and no details. This PR 
adds also a stringified version of the stack trace of the server side 
exception. This way the stack trace in logs on the client side will also 
contain the stack trace of the server side with more details about actual 
failure.
   
   ## Brief change log
   
     - pack the stringified version of the exception stack trace into the error 
list of ErrorResponseBody in AbstractRestHandler.processRestHandlerException
     - strip CompletionException in completeExceptionally case of 
FutureUtils.retryOperationWithDelay to avoid double logging of underlying 
exception
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Wrong / missing exception when submitting job
> ---------------------------------------------
>
>                 Key: FLINK-10312
>                 URL: https://issues.apache.org/jira/browse/FLINK-10312
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.5.2, 1.6.0
>            Reporter: Stephan Ewen
>            Assignee: Andrey Zagrebin
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.7.0, 1.6.2, 1.5.5
>
>         Attachments: lmerge-TR.pdf
>
>
> h3. Problem
> When submitting a job that cannot be created / initialized on the JobManager, 
> there is no proper error message. The exception says *"Could not retrieve the 
> execution result. (JobID: 5a7165e1260c6316fa11d2760bd3d49f)"*
> h3. Steps to Reproduce
> Create a streaming job, set a state backend with a non existing file system 
> scheme.
> h3. Full Stack Trace
> {code}
> Submitting a job where instantiation on the JM fails yields this, which seems 
> like a major regression from seeing the actual exception:
> org.apache.flink.client.program.ProgramInvocationException: Could not 
> retrieve the execution result. (JobID: 5a7165e1260c6316fa11d2760bd3d49f)
>       at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:260)
>       at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
>       at 
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
>       at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1511)
>       at 
> com.dataartisans.streamledger.examples.simpletrade.SimpleTradeExample.main(SimpleTradeExample.java:98)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
>       at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
>       at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:426)
>       at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
>       at 
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
>       at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
>       at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
>       at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$16(CliFrontend.java:1120)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>       at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>       at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
> submit JobGraph.
>       at 
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$25(RestClusterClient.java:379)
>       at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>       at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>       at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>       at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>       at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$32(FutureUtils.java:213)
>       at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>       at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>       at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>       at 
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
>       at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
>       at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not 
> complete the operation. Exception is not retryable.
>       at 
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>       at 
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>       at 
> java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
>       at 
> java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
>       ... 12 more
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
> Could not complete the operation. Exception is not retryable.
>       ... 10 more
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.rest.util.RestClientException: [Job submission 
> failed.]
>       at 
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>       at 
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>       at 
> java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
>       at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
>       at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>       ... 4 more
> Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job 
> submission failed.]
>       at 
> org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
>       at 
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$364(RestClient.java:294)
>       at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
>       ... 5 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to