[
https://issues.apache.org/jira/browse/FLINK-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chesnay Schepler updated FLINK-13660:
-------------------------------------
Summary: Forward jar uploads to leading JobManager (was: Cannot submit job
on Flink session cluster on kubernetes with multiple JM pods (zk HA) through
web frontend)
> Forward jar uploads to leading JobManager
> -----------------------------------------
>
> Key: FLINK-13660
> URL: https://issues.apache.org/jira/browse/FLINK-13660
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination, Runtime / REST, Runtime / Web
> Frontend
> Affects Versions: 1.9.0
> Reporter: MalcolmSanders
> Priority: Minor
>
> Hi, all,
> Previously I'm testing HighAvailabilityService of Flink 1.9 on k8s. When
> testing Flink session cluster with 3 JM pods deployed on k8s, I find the jar
> I previously uploaded to the web frontend will continuously dispear in
> "Uploaded Jars" web page. As a result, it's hard to submit the job.
> After investigation, I find that it has something to do with (1) the
> implementation of method "handleRequest" of "JarListHandler" and
> "JarUploadHandler" RestHandlers along with (2) the routing mechanism of k8s
> service.
> (1) It seem to me that "handleRequest" method should dispatch the message
> through "DispatcherGateway gateway" to the leader JM. While the two
> RestHanders don't use the gateway and just do things locally. That is to say
> if a "upload jar" request or "list loaded jars" request is sent to any of the
> 3 JMs, the web frontend will only storage or fetch jars from local directory.
> (2) I use k8s service to open a flink web page, the URL pattern is (PS: start
> "kubectl proxy" locally):
> http://127.0.0.1:8001/api/v1/namespaces/${my_ns}/services/${my_session_cluster_service}:ui/proxy/#/submit
> Since there a 3 endpoints (3 JMs) of this k8s service, the k8s routing
> mechanism will randomly choose which endpoint (JM) a REST message sends to.
> As a result of the two factors, Flink session cluster previously cannot be
> deployed with multiple JMs using HighAvailablityService on k8s.
> Proposals:
> (1) redirect jar related REST messages to the leader JM
> (2) (along with proposal(1)) synchronize jar files with the standby JMs
> incase of standby JM taking the leadership
> (3) support upload jars to global filesystem (etc. dfs)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)