[
https://issues.apache.org/jira/browse/FLINK-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955099#comment-16955099
]
Canbin Zheng commented on FLINK-9953:
-------------------------------------
[~fly_in_gis] Thanks a lot for your quick response and very glad to have a
chance to work with you on this exciting feature.
> I think in the per-job cluster in flink means it is dedicated cluster for
> only one job and will not accept more other jobs.
Yep, you are right; a per-job cluster is always a dedicated cluster for only
one job. One option mentioned in my design doc is to split the deploying step
into two parts;
* Part one, deploy a cluster without JobGraph attached;
* Part two, submit a job to that cluster via `RestClusterClient` and shut down
the cluster when the job finishes. For this idea, some new Kubernetes dedicated
specialization, such as `KubernetesMiniDispatcher`,
`KubernetesSingleJobGraphStore` will be introduced to support the new per-job
cluster workflow. Both `KubernetesMiniDispatcher` and
`KubernetesSingleJobGraphStore` do not need a JobGraph when we construct them,
but accept only one single `JobGraph` after they are instantiated. So actually,
this solution does not change the current definition of a per-job cluster.
It seems this solution is slightly customised, but at a higher level, I think
it provides a possible way to unify the deploying process of session and
per-job cluster.
In addition to the previous option, we have other options to solve dynamic
dependency management for a per-job cluster. As a prerequisite, we upload those
locally-hosted dependencies to a Hadoop Compatible File System, referred to
HCFS, which is accessible to the Flink cluster. Then we fetch those
dependencies for the job to run, and there are at least two solutions to get
this.
* Solution One
Download dependencies from HCFS after starting JM(s) or TMs.
1. JM localizes those dependencies by downloading them when a JobManagerImpl is
instantiated.
2. TMs fetch those dependencies when Task#run() is invoked.
* Solution Two
Download dependencies before starting JM(s) or TMs by utilizing a Kubernetes
feature known as init-containers. Init-containers always run to completion
before the main container is started, typically used to handle initialization
work for the primary containers.
> The users could put their jars in the image. And the
> `ClassPathJobGraphRetriever` will be used to generate and retrieve the job
> graph in flink master pod.
This is a straightforward workflow; we build an image containing all the
necessary application resource, such as application code, input files, etc.,
then run the application entirely from that image; many applications are
working in this way in the Kubernetes ecosystem, we can add support for this
use case.
But some dependencies may not be known at image build time, or could be too
large to be baked into a container image, or need frequent changes according to
new business scenarios. For these cases, I propose to use a standard image with
Flink distribution and supply dependencies at runtime; surely, we have several
workarounds to support dynamic dependencies.
> So i suggest to add kubernetes-job.sh to start per-job cluster. It will not
> need a user jar as required argument. `flink run` could be used to submitted
> a flink job to existed session.
We can make changes to the existing `flink` shell to meet this requirement;
it’s better not to introduce another dedicated kubernetes-job.sh to start a
per-job cluster on Kubernetes.
> Active Kubernetes integration
> -----------------------------
>
> Key: FLINK-9953
> URL: https://issues.apache.org/jira/browse/FLINK-9953
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Coordination
> Reporter: Till Rohrmann
> Assignee: Yang Wang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This is the umbrella issue tracking Flink's active Kubernetes integration.
> Active means in this context that the {{ResourceManager}} can talk to
> Kubernetes to launch new pods similar to Flink's Yarn and Mesos integration.
> Phase1 implementation will have complete functions to make flink running on
> kubernetes. Phrase1 is mainly focused on production optimization, including
> k8s native high-availability, storage, network, log collector and etc.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)