[ 
https://issues.apache.org/jira/browse/FLINK-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-13660:
-------------------------------------
    Summary: Forward jar uploads to leading JobManager  (was: Cannot submit job 
on Flink session cluster on kubernetes with multiple JM pods (zk HA) through 
web frontend)

> Forward jar uploads to leading JobManager
> -----------------------------------------
>
>                 Key: FLINK-13660
>                 URL: https://issues.apache.org/jira/browse/FLINK-13660
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination, Runtime / REST, Runtime / Web 
> Frontend
>    Affects Versions: 1.9.0
>            Reporter: MalcolmSanders
>            Priority: Minor
>
> Hi, all,
> Previously I'm testing HighAvailabilityService of Flink 1.9 on k8s. When 
> testing Flink session cluster with 3 JM pods deployed on k8s, I find the jar 
> I previously uploaded to the web frontend will continuously dispear in 
> "Uploaded Jars" web page. As a result, it's hard to submit the job.
> After investigation, I find that it has something to do with (1) the 
> implementation of method "handleRequest" of "JarListHandler" and 
> "JarUploadHandler" RestHandlers along with (2) the routing mechanism of k8s 
> service.
> (1) It seem to me that "handleRequest" method should dispatch the message 
> through "DispatcherGateway gateway" to the leader JM. While the two 
> RestHanders don't use the gateway and just do things locally. That is to say 
> if a "upload jar" request or "list loaded jars" request is sent to any of the 
> 3 JMs, the web frontend will only storage or fetch jars from local directory.
> (2) I use k8s service to open a flink web page, the URL pattern is (PS: start 
> "kubectl proxy" locally): 
> http://127.0.0.1:8001/api/v1/namespaces/${my_ns}/services/${my_session_cluster_service}:ui/proxy/#/submit
> Since there a 3 endpoints (3 JMs) of this k8s service, the k8s routing 
> mechanism will randomly choose which endpoint (JM) a REST message sends to.
> As a result of the two factors, Flink session cluster previously cannot be 
> deployed with multiple JMs using HighAvailablityService on k8s.
> Proposals:
> (1) redirect jar related REST messages to the leader JM
> (2) (along with proposal(1)) synchronize jar files with the standby JMs 
> incase of standby JM taking the leadership
> (3) support upload jars to global filesystem (etc. dfs)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to