[
https://issues.apache.org/jira/browse/FLINK-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036904#comment-17036904
]
Andrey Zagrebin commented on FLINK-16018:
-----------------------------------------
In this particular case, it was just un-alignment of timeouts which is hard to
get between framework and custom operation timeouts.
So basically, it can always happen unless submission operation asynchronously
monitors some general response timeout for each operation invoked during the
submission then it can report where it got stuck.
> Improve error reporting when submitting batch job (instead of
> AskTimeoutException)
> ----------------------------------------------------------------------------------
>
> Key: FLINK-16018
> URL: https://issues.apache.org/jira/browse/FLINK-16018
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.11.0
> Reporter: Robert Metzger
> Priority: Major
>
> While debugging the {{Shaded Hadoop S3A end-to-end test (minio)}} pre-commit
> test, I noticed that the JobSubmission is not producing very helpful error
> messages.
> Environment:
> - A simple batch wordcount job
> - a unavailable minio s3 filesystem service
> What happens from a user's perspective:
> - The job submission fails after 10 seconds with a AskTimeoutException:
> {code}
> 2020-02-07T11:38:27.1189393Z akka.pattern.AskTimeoutException: Ask timed out
> on [Actor[akka://flink/user/dispatcher#-939201095]] after [10000 ms]. Message
> of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical
> reason for `AskTimeoutException` is that the recipient actor didn't send a
> reply.
> 2020-02-07T11:38:27.1189538Z at
> akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
> 2020-02-07T11:38:27.1189616Z at
> akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
> 2020-02-07T11:38:27.1189713Z at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
> 2020-02-07T11:38:27.1189789Z at
> akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
> 2020-02-07T11:38:27.1189883Z at
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> 2020-02-07T11:38:27.1189973Z at
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> 2020-02-07T11:38:27.1190067Z at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> 2020-02-07T11:38:27.1190159Z at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
> 2020-02-07T11:38:27.1190267Z at
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
> 2020-02-07T11:38:27.1190358Z at
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
> 2020-02-07T11:38:27.1190465Z at
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
> 2020-02-07T11:38:27.1190540Z at java.lang.Thread.run(Thread.java:748)
> {code}
> What a user would expect:
> - An error message indicating why the job submission failed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)