[
https://issues.apache.org/jira/browse/FLINK-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880502#comment-16880502
]
Zhenqiu Huang commented on FLINK-13132:
---------------------------------------
[~maguowei] [~till.rohrmann]
The reason I want to proposal the change is that we are managing 1000+
production jobs for our customers. As a platform, we need to restart all of the
jobs within 5 minutes according to our SLA, no matter it is a cluster
maintenance or it is a transient infrastructure failure.
Beside this, we are moving to a hybrid cloud architecture. So we need to run
jobs to yarn and another internal cluster management scheduler on top of mesos
and k8 for future. Currently, our customers upload jars to a storage management
layer that cross multiple storage systems (for internal and cloud), so that
they can be secure and efficiently access in both environment. We say cost to
download jar into the service and generate JobGraph in client side is that we
need to start a process for each new job within the service. From our
profiling, it is not scalable enough, so that a large number of instance of the
service are needed for the worst case. But the regular QPS for it is just 10
per second. Thus, we want to further optimize the job submission by push the
job graph generation onto ClusterEntrypoint. By this way, the average job
submission time will be reduced with much less resources.
> Allow ClusterEntrypoints use user main method to generate job graph
> -------------------------------------------------------------------
>
> Key: FLINK-13132
> URL: https://issues.apache.org/jira/browse/FLINK-13132
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN
> Affects Versions: 1.8.0, 1.8.1
> Reporter: Zhenqiu Huang
> Assignee: Zhenqiu Huang
> Priority: Minor
>
> We are building a service that can transparently deploy a job to different
> cluster management systems, such as Yarn and another internal system. It is
> very cost to download the jar and generate JobGraph in the client side. Thus,
> I want to propose an improvement to make Yarn Entrypoints can be configurable
> to use either FileJobGraphRetriever or ClassPathJobGraphRetriever. It is
> actually a long asking TODO in AbstractionYarnClusterDescriptor in line 834.
> https://github.com/apache/flink/blob/21468e0050dc5f97de5cfe39885e0d3fd648e399/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java#L834
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)