Re: [DISCUSS] Best practice to run flink on kubernetes

Yang Wang Sun, 29 Sep 2019 20:20:35 -0700

Hi dev and users,

I just want to revive this discussion because we have some meaningful
progress about
kubernetes native integration. I have made a draft implementation to
complete the poc.
Cli and submission are both working as expected. The design doc[1] has been
updated,
including the detailed submission progress, the cli and yaml user interface
and the implementation
plan. All comments and suggestions are welcome.


BTW, we have made a speech at alibaba apsara conference last friday in "Big
Data Ecosystem"
session[2]. And we heard that many companies and users are planning to
migrate their big data
workloads to kubernetes cluster. Through mixed-run with online services,
they could get better
resource utilization and reduce the cost. Flink, as an important case, the
dynamical resource
allocation is the basic requirement. That's why we want to move the
progress more faster.


Best,
Yang

[1].
https://docs.google.com/document/d/1-jNzqGF6NfZuwVaFICoFQ5HFFXzF5NVIagUZByFMfBY/edit?usp=sharing
[2].
https://www.alibabacloud.com/zh/apsara-conference-2019?spm=a2c4e.11165380.1395221.13


Yang Wang <danrtsey...@gmail.com> 于2019年8月30日周五 下午2:23写道：

> Hi Zhenghua,
>
> You are right. For per-job cluster, the taskmanagers will be allocated
>
> dynamically by KubernetesResourceManager. For session cluster, we hope
>
> taskmangers could be pre-allocated even though it does not work now.
>
> Please navigate to the doc[1] for more details.
>
>
>
>
> Hi Thomas,
>
> We have no doubt that flink only need to support #1 and #3. For #1,
>
> we need external deployment management tools to make it in production.
>
> I also think kubernetes operator is good choice. It makes managing
> multiple
>
> flink jobs and long running streaming applications easier.
>
>
> Also in some companies, they have their own flink job management platform.
>
> Platform users submit flink job through webui. Update the flink
> configuration
>
> and restart the the job.
>
>
> For #3, we just want to make it possible to start flink job cluster and
> session
>
> cluster through cli. These users who used to run flink workloads on yarn
> are
>
> very convenient to migrate to kubernetes cluster. Compared to #1, the
> dynamic
>
> resource allocation is an important advantage. Maybe it could also be
> introduced
>
> to #1 in the future by some way.
>
>
>
>
> [1].
> https://docs.google.com/document/d/1-jNzqGF6NfZuwVaFICoFQ5HFFXzF5NVIagUZByFMfBY/edit?usp=sharing
>
> Thomas Weise <t...@apache.org> 于2019年8月29日周四 下午10:24写道：
>
>> Till had already summed it up, but I want to emphasize that Flink as
>> project only needs to provide #1 (reactive mode) and #3 (active mode,
>> which
>> necessarily is tied to the cluster manager of choice). The latter would be
>> needed for Flink jobs to be elastic (in the future), although we may want
>> to discuss how such capability can be made easier with #1 as well.
>>
>> For users #1 alone is of little value, since they need to solve their
>> deployment problem. So it will be good to list options such as the Lyft
>> Flink k8s operator on the ecosystem page and possibly point to that from
>> the Flink documentation as well.
>>
>> I also want to point out that #3, while it looks easy to start with, has
>> an
>> important limitation when it comes to manage long running streaming
>> applications. Such application essentially will be a sequence of jobs that
>> come and go across stateful upgrades or rollbacks. Any solution that is
>> designed to manage a single Flink job instance can't address that need.
>> That is why the k8s operator was created. It specifically understands the
>> concept of an application.
>>
>> Thomas
>>
>>
>> On Wed, Aug 28, 2019 at 7:56 PM Zhenghua Gao <doc...@gmail.com> wrote:
>>
>> > Thanks Yang for bringing this up. I think option1 is very useful for
>> early
>> > adopters.
>> > People do not know much about k8s and can easily set up on minikube to
>> have
>> > a taste.
>> >
>> > For option2 and option3, i prefer option3 because i am familiar yarn and
>> > don't have much concept of k8s.
>> > And there is some doube about starting a session cluster in option3:
>> >
>> > > ./bin/kubernetes-session.sh -d -n 2 -tm 512 -s 4 -nm
>> > flink-session-example
>> > > -i flink:latest -kD kubernetes.service.exposed.type=NODE_PORT
>> >
>> > Is the -n option means number of TaskManager?
>> > Do we pre-running taskmanager pods or requesting and launching
>> taskmanager
>> > pods dynamically?
>> >
>> > *Best Regards,*
>> > *Zhenghua Gao*
>> >
>> >
>> > On Fri, Aug 9, 2019 at 9:12 PM Yang Wang <danrtsey...@gmail.com> wrote:
>> >
>> > > Hi all,
>> > >
>> > > Currently cloud native architectures has been introduced to many
>> > companies
>> > > in production. They use kubernetes to run deep learning, web server,
>> etc.
>> > > If we could deploy the per-job/session flink cluster on kubernetes to
>> > make
>> > > it mix-run with other workloads, the cluster resource utilization
>> will be
>> > > better. Also many kubernetes users are more easier to have a taste on
>> the
>> > > flink.
>> > >
>> > > By now we have three options to run flink jobs on k8s.
>> > >
>> > > [1]. Create jm/tm/service yaml and apply, then you will get a flink
>> > > standalone cluster on k8s. Use flink run to submit job to the existed
>> > flink
>> > > cluster. Some companies may have their own deploy system to manage the
>> > > flink cluster.
>> > >
>> > > [2]. Use flink-k8s-operator to manage multiple flink clusters,
>> including
>> > > session and perjob. It could manage the complete deployment lifecycle
>> of
>> > > the application. I think this option is really easy to use for the k8s
>> > > users. They are familiar with k8s-opertor, kubectl and other tools of
>> > k8s.
>> > > They could debug and run the flink cluster just like other k8s
>> > > applications.
>> > >
>> > > [3]. Natively integration with k8s, use the flink run or
>> > > kubernetes-session.sh to start a flink cluster. It is very similar to
>> > > submitting an flink cluster to Yarn. KubernetesClusterDescriptor
>> talks to
>> > > k8s api server to start a flink master deployment of 1.
>> > > KubernetesResourceManager dynamically allocates resource from k8s to
>> > start
>> > > task manager as demand. This option is very easy for flink users to
>> get
>> > > started. In the simplest case, we just need to update the '-m
>> > yarn-cluster'
>> > > to -m '-m kubernetes-cluster'.
>> > >
>> > > We have make an internal implementation of option [3] and use it in
>> > > production. After fully tested, we hope to contribute it to the
>> > community.
>> > > Now we want to get some feedbacks about the three options. Any
>> comments
>> > are
>> > > welcome.
>> > >
>> > >
>> > > > What do we need to prepare when start a flink cluster on k8s using
>> > native
>> > > integration?
>> > >
>> > > Download the flink release binary and create the ~/.kube/config file
>> > > corresponding to the k8s cluster. It is all what you need.
>> > >
>> > >
>> > > > Flink Session cluster
>> > >
>> > > * start a session cluster
>> > >
>> > > ./bin/kubernetes-session.sh -d -n 2 -tm 512 -s 4 -nm
>> > flink-session-example
>> > > -i flink:latest -kD kubernetes.service.exposed.type=NODE_PORT
>> > >
>> > > *  You will get an address to submit job, specify it through ’-ksa’
>> > option
>> > >
>> > > ./bin/flink run -d -p 4 -m kubernetes-cluster -knm
>> flink-session-example
>> > > -ksa {x.x.x.x:12345} examples/streaming/WindowJoin.jar
>> > >
>> > >
>> > > > Flink Job Cluster
>> > >
>> > > * running with official flink image
>> > >
>> > > ./bin/flink run -d -p 4 -m kubernetes-cluster -knm
>> flink-perjob-example-1
>> > > -ki flink:latest examples/streaming/WindowJoin.jar
>> > >
>> > > * running with user image
>> > >
>> > > ./bin/flink run -d -p 4 -m kubernetes-cluster -knm
>> flink-perjob-example-1
>> > > -ki flink-user:latest examples/streaming/WindowJoin.jar
>> > >
>> > >
>> > >
>> > > [1].
>> > >
>> > >
>> >
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html
>> > >
>> > > [2].https://github.com/lyft/flinkk8soperator
>> > >
>> > > [3].
>> > >
>> > >
>> >
>> https://docs.google.com/document/d/1Zmhui_29VASPcBOEqyMWnF3L6WEWZ4kedrCqya0WaAk/edit#
>> > >
>> >
>>
>

Re: [DISCUSS] Best practice to run flink on kubernetes

Reply via email to