Github user GEOFBOT commented on the issue:
https://github.com/apache/flink/pull/3232
It may have worked with a smaller file, but there may be issues with
heavier jobs. When I ran a more computationally intensive and time consuming
job, the first job of the Python file ran successfully. The second job of the
file was then submitted:
```
<snip>
02/09/2017 16:39:43 DataSink (CsvSink)(4/5) switched to FINISHED
02/09/2017 16:39:43 Job execution switched to status FINISHED.
2017-02-09 16:40:26,470 INFO org.apache.flink.yarn.YarnClusterClient
- Waiting until all TaskManagers have connected
Waiting until all TaskManagers have connected
2017-02-09 16:40:26,476 INFO org.apache.flink.yarn.YarnClusterClient
- TaskManager status (5/5)
TaskManager status (5/5)
2017-02-09 16:40:26,476 INFO org.apache.flink.yarn.YarnClusterClient
- All TaskManagers are connected
All TaskManagers are connected
2017-02-09 16:40:26,480 INFO org.apache.flink.yarn.YarnClusterClient
- Submitting job with JobID: b226f5f18a78bc386bd1b1b6d30515ea.
Waiting for job completion.
Submitting job with JobID: b226f5f18a78bc386bd1b1b6d30515ea. Waiting for
job completion.
Connected to JobManager at
Actor[akka.tcp://flink@<snip>.ec2.internal:35598/user/jobmanager#68430682]
```
However, Flink does not receive or respond to this new job. Instead, the
client terminates with a timeout error:
```
Caused by:
org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: Job
submission to the JobManager timed out. You may increase 'akka.client.timeout'
in case the JobManager needs more time to configure and confirm the job
submission.
at
org.apache.flink.runtime.client.JobSubmissionClientActor.handleCustomMessage(JobSubmissionClientActor.java:119)
at
org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:239)
at
org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)
at
org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
at
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
```
I tried setting `akka.client.timeout` to 20 minutes, but Flink is still not
receiving the second job. I suspect this may be an issue with this patch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---