[
https://issues.apache.org/jira/browse/FLINK-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449539#comment-16449539
]
ASF GitHub Bot commented on FLINK-9211:
---------------------------------------
GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/5903
[FLINK-9211][REST] JarRunHandler submits job to Dispatcher via RPC
## What is the purpose of the change
This PR reworks the `JarRunHandler` to submit the job to the dispatcher via
RPC, instead of setting up a `RestClusterClient` and going through the client's
job-submission routine.
The reasoning is that the existing behavior was causing issues on
kubernetes, and this change also removes a special-case as this was the only
handler that actively sends out rest requests.
## Brief change log
* `JarRunHandler` now has access to `DispatcherGateway`
* `JarRunHandler` now uploads jar and submits job via RPC
## Verifying this change
* run `JarRunHandlerTest`
* submit job through webUI
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (no)
- The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 9211
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5903.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5903
----
commit e662220a50ed2f430cfc15082af11ab24f233bf9
Author: zentol <chesnay@...>
Date: 2018-04-23T10:35:51Z
[FLINK-9211][REST] JarRunHandler submits job to Dispatcher via RPC
commit 51772685541adb185ffebcc5800d4eb6e60d35d3
Author: zentol <chesnay@...>
Date: 2018-04-24T09:34:11Z
add test
----
> Job submission via REST/dashboard does not work on Kubernetes
> -------------------------------------------------------------
>
> Key: FLINK-9211
> URL: https://issues.apache.org/jira/browse/FLINK-9211
> Project: Flink
> Issue Type: Bug
> Components: Client, Web Client
> Affects Versions: 1.5.0
> Reporter: Aljoscha Krettek
> Assignee: Aljoscha Krettek
> Priority: Blocker
> Fix For: 1.5.0
>
>
> When setting up a cluster on Kubernets according to the documentation it is
> possible to upload jar files but when trying to execute them you get an
> exception like this:
> {code}
> org.apache.flink.runtime.rest.handler.RestHandlerException:
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit
> JobGraph.
> at
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$2(JarRunHandler.java:113)
> at
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
> at
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:196)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:214)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit
> JobGraph.
> at
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$5(RestClusterClient.java:356)
> ... 17 more
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to
> submit JobGraph.
> ... 18 more
> Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException:
> connection timed out: flink-jobmanager/10.105.154.28:8081
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
> at
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
> ... 15 more
> Caused by:
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException:
> connection timed out: flink-jobmanager/10.105.154.28:8081
> at
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:212)
> ... 7 more
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)