[
https://issues.apache.org/jira/browse/FLINK-22599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541744#comment-17541744
]
Mason Chen commented on FLINK-22599:
------------------------------------
+1 for fixing this issue and changing the priority of the ticket if it's
trivial to solve. This issue is pretty confusing and led me in the wrong
direction in debugging.
> Point to `client.timeout` in error messages where it is used for rpc timeout
> ----------------------------------------------------------------------------
>
> Key: FLINK-22599
> URL: https://issues.apache.org/jira/browse/FLINK-22599
> Project: Flink
> Issue Type: Improvement
> Components: Client / Job Submission
> Affects Versions: 1.13.0
> Reporter: Xintong Song
> Priority: Not a Priority
> Labels: auto-deprioritized-major, auto-deprioritized-minor,
> starter
>
> `client.timeout` is used, instead of `akka.ask.timeout`, as RPC timeout on
> the client side. However, this is a bit implicit for users.
> E.g., the following shows an error stack of a rpc timeout during job
> submission in application mode. A user seeing "Caused by:
> akka.pattern.AskTimeoutException" could be misled to `akka.ask.timeout`.
> {code:java}
> 2021-05-07 21:02:37,513 WARN
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap
> [] - Application failed unexpectedly:
> java.util.concurrent.CompletionException:
> org.apache.flink.client.deployment.application.ApplicationExecutionException:
> Could not execute application.
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> ~[?:1.8.0_181]
> at
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:257)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$1(ApplicationDispatcherBootstrap.java:212)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_181]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [?:1.8.0_181]
> at
> org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:159)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> Caused by:
> org.apache.flink.client.deployment.application.ApplicationExecutionException:
> Could not execute application.
> ... 11 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The
> main method caused an error: Failed to execute sql
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:360)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:213)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:242)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> ... 10 more
> Caused by: org.apache.flink.table.api.TableException: Failed to execute sql
> at
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:699)
> ~[flink-table-blink_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.table.api.internal.StatementSetImpl.execute(StatementSetImpl.java:98)
> ~[flink-table-blink_2.11-1.12.1.jar:1.12.1]
> at
> com.kad.cube.dwd.retail.initial.DwdOrderRetailOrderPayInitial.sinkDwdRetailPay(DwdOrderRetailOrderPayInitial.java:91)
> ~[cube-data-dwd-order-offline-flink.jar:?]
> at
> com.kad.cube.dwd.retail.initial.DwdOrderRetailOrderPayInitial.main(DwdOrderRetailOrderPayInitial.java:59)
> ~[cube-data-dwd-order-offline-flink.jar:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_181]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_181]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_181]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:343)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:213)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:242)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> ... 10 more
> Caused by: org.apache.flink.util.FlinkException: Failed to execute job
> 'insert-into_kudu.default_database.impala::cube_kudu.dwd_order_retail_order_pay'.
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1918)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55)
> ~[flink-table-blink_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:681)
> ~[flink-table-blink_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.table.api.internal.StatementSetImpl.execute(StatementSetImpl.java:98)
> ~[flink-table-blink_2.11-1.12.1.jar:1.12.1]
> at
> com.kad.cube.dwd.retail.initial.DwdOrderRetailOrderPayInitial.sinkDwdRetailPay(DwdOrderRetailOrderPayInitial.java:91)
> ~[cube-data-dwd-order-offline-flink.jar:?]
> at
> com.kad.cube.dwd.retail.initial.DwdOrderRetailOrderPayInitial.main(DwdOrderRetailOrderPayInitial.java:59)
> ~[cube-data-dwd-order-offline-flink.jar:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_181]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_181]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_181]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:343)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:213)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:242)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> ... 10 more
> Caused by: java.lang.RuntimeException: Error while waiting for job to be
> initialized
> at
> org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$submitAndGetJobClientFuture$2(EmbeddedExecutor.java:140)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
> ~[?:1.8.0_181]
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> ~[?:1.8.0_181]
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> ~[?:1.8.0_181]
> Caused by: java.util.concurrent.ExecutionException:
> java.util.concurrent.TimeoutException: Invocation of public default
> java.util.concurrent.CompletableFuture
> org.apache.flink.runtime.webmonitor.RestfulGateway.requestJobStatus(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
> timed out.
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> ~[?:1.8.0_181]
> at
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$null$0(EmbeddedExecutor.java:145)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:144)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$submitAndGetJobClientFuture$2(EmbeddedExecutor.java:140)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
> ~[?:1.8.0_181]
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> ~[?:1.8.0_181]
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> ~[?:1.8.0_181]
> Caused by: java.util.concurrent.TimeoutException: Invocation of public
> default java.util.concurrent.CompletableFuture
> org.apache.flink.runtime.webmonitor.RestfulGateway.requestJobStatus(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
> timed out.
> at org.apache.flink.runtime.rpc.akka.$Proxy27.requestJobStatus(Unknown
> Source) ~[?:1.12.1]
> at
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$null$0(EmbeddedExecutor.java:143)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:144)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$submitAndGetJobClientFuture$2(EmbeddedExecutor.java:140)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
> ~[?:1.8.0_181]
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> ~[?:1.8.0_181]
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> ~[?:1.8.0_181]
> at
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> ~[?:1.8.0_181]
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/rpc/dispatcher_1#1316257960]] after [60000 ms].
> Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A
> typical reason for `AskTimeoutException` is that the recipient actor didn't
> send a reply.
> at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
> ~[flink-dist_2.11-1.12.1.jar:1.12.1]
> at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]{code}
> It would be helpful to catch such timeout exceptions, add a message that
> points to the right configuration option, and re-throw them. In addition to
> the above stack, we should also check other places where `client.timeout` is
> used.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)