tillrohrmann commented on a change in pull request #14222: URL: https://github.com/apache/flink/pull/14222#discussion_r531137401
########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. Review comment: ```suggestion The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a `JobGraph` and submits it to the `JobManager`. ``` ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). Review comment: ```suggestion If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link deployment/resource-providers/standalone/index.md %}). ``` ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. Review comment: ```suggestion The `JobManager` distributes the work onto the `TaskManagers`, where the actual operators (such as sources, transformations and sinks) are running. ``` ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> Review comment: links are missing. ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> Review comment: ```suggestion <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job's main method (or client) runs only prior to the cluster creation.</li> ``` ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> + </tr> + <tr> + <td>Metrics Storage</td> + <td> + Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well. + </td> + <td>See <a href="">Metrics Reporter</a> page.</td> + </tr> + <tr> + <td>Application-level data sources and sinks</td> + <td> + While application-level data sources and sinks are not technically part of the deployment of Flink cluster components, they should be considered when planning a new Flink production deployment. Colocating frequently used data with Flink can have significant performance benefits + </td> + <td> + For example: + <ul> + <li>Apache Kafka</li> + <li>Amazon S3</li> + <li>ElasticSearch</li> + <li>Apache Cassandra</li> + </ul> + See <a href="">Connectors</a> page. + </td> + </tr> + </tbody> +</table> + + + +## Deployment Modes + +Flink can execute applications in one of three ways: + - in Session Mode, + - in a Per-Job Mode, or + - in Application Mode. Review comment: Should we order them 1. application mode 2. per-job 3. application mode ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> + </tr> + <tr> + <td>Metrics Storage</td> + <td> + Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well. + </td> + <td>See <a href="">Metrics Reporter</a> page.</td> + </tr> + <tr> + <td>Application-level data sources and sinks</td> + <td> + While application-level data sources and sinks are not technically part of the deployment of Flink cluster components, they should be considered when planning a new Flink production deployment. Colocating frequently used data with Flink can have significant performance benefits + </td> + <td> + For example: + <ul> + <li>Apache Kafka</li> + <li>Amazon S3</li> + <li>ElasticSearch</li> + <li>Apache Cassandra</li> + </ul> + See <a href="">Connectors</a> page. + </td> + </tr> + </tbody> +</table> + + + +## Deployment Modes + +Flink can execute applications in one of three ways: + - in Session Mode, + - in a Per-Job Mode, or + - in Application Mode. + + The above modes differ in: + - the cluster lifecycle and resource isolation guarantees + - whether the application's `main()` method is executed on the client or on the cluster. + + + +<!-- Image source: https://docs.google.com/drawings/d/1EfloufuOp1A7YDwZmBEsHKRLIrrbtRkoWRPcfZI5RYQ/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_modes.svg %}" alt="Figure for Deployment Modes" /> + +#### Session Mode + +*Session mode* assumes an already running cluster and uses the resources of that cluster to execute any +submitted application. Applications executed in the same (session) cluster use, and consequently compete +for, the same resources. This has the advantage that you do not pay the resource overhead of spinning up +a full cluster for every submitted job. But, if one of the jobs misbehaves or brings down a Task Manager, +then all jobs running on that Task Manager will be affected by the failure. This, apart from a negative Review comment: ```suggestion then all jobs running on that TaskManager (or task manager) will be affected by the failure. This, apart from a negative ``` ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> + </tr> + <tr> + <td>Metrics Storage</td> + <td> + Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well. + </td> + <td>See <a href="">Metrics Reporter</a> page.</td> + </tr> + <tr> + <td>Application-level data sources and sinks</td> + <td> + While application-level data sources and sinks are not technically part of the deployment of Flink cluster components, they should be considered when planning a new Flink production deployment. Colocating frequently used data with Flink can have significant performance benefits + </td> + <td> + For example: + <ul> + <li>Apache Kafka</li> + <li>Amazon S3</li> + <li>ElasticSearch</li> + <li>Apache Cassandra</li> + </ul> + See <a href="">Connectors</a> page. + </td> + </tr> + </tbody> +</table> + + + +## Deployment Modes + +Flink can execute applications in one of three ways: + - in Session Mode, + - in a Per-Job Mode, or + - in Application Mode. + + The above modes differ in: + - the cluster lifecycle and resource isolation guarantees + - whether the application's `main()` method is executed on the client or on the cluster. + + + +<!-- Image source: https://docs.google.com/drawings/d/1EfloufuOp1A7YDwZmBEsHKRLIrrbtRkoWRPcfZI5RYQ/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_modes.svg %}" alt="Figure for Deployment Modes" /> + +#### Session Mode + +*Session mode* assumes an already running cluster and uses the resources of that cluster to execute any +submitted application. Applications executed in the same (session) cluster use, and consequently compete +for, the same resources. This has the advantage that you do not pay the resource overhead of spinning up +a full cluster for every submitted job. But, if one of the jobs misbehaves or brings down a Task Manager, Review comment: Same here with "Task Manager" ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. Review comment: Not sure about the backticks. If we decide to go this way, then it needs to be applied consistently throughout the whole document. ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> Review comment: I'd suggest to put session mode last. ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> Review comment: Maybe set anchor link if possible. ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> Review comment: ```suggestion <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job's main method (or client) gets executed on the JobManager.</li> ``` ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. Review comment: ```suggestion If you don't know where to start, we recommend using the [Command Line Interface]({% link deployment/cli.md %}) for submitting Flink applications to a Standalone Cluster. ``` ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. Review comment: Maybe ```suggestion Compiles batch or streaming applications into a dataflow graph, which it then submits to the JobManager. ``` ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> + </tr> + <tr> + <td>Metrics Storage</td> + <td> + Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well. + </td> + <td>See <a href="">Metrics Reporter</a> page.</td> + </tr> + <tr> + <td>Application-level data sources and sinks</td> + <td> + While application-level data sources and sinks are not technically part of the deployment of Flink cluster components, they should be considered when planning a new Flink production deployment. Colocating frequently used data with Flink can have significant performance benefits + </td> + <td> + For example: + <ul> + <li>Apache Kafka</li> + <li>Amazon S3</li> + <li>ElasticSearch</li> + <li>Apache Cassandra</li> + </ul> + See <a href="">Connectors</a> page. + </td> + </tr> + </tbody> +</table> + + + +## Deployment Modes + +Flink can execute applications in one of three ways: + - in Session Mode, + - in a Per-Job Mode, or + - in Application Mode. + + The above modes differ in: + - the cluster lifecycle and resource isolation guarantees + - whether the application's `main()` method is executed on the client or on the cluster. + + + +<!-- Image source: https://docs.google.com/drawings/d/1EfloufuOp1A7YDwZmBEsHKRLIrrbtRkoWRPcfZI5RYQ/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_modes.svg %}" alt="Figure for Deployment Modes" /> + +#### Session Mode + +*Session mode* assumes an already running cluster and uses the resources of that cluster to execute any +submitted application. Applications executed in the same (session) cluster use, and consequently compete +for, the same resources. This has the advantage that you do not pay the resource overhead of spinning up +a full cluster for every submitted job. But, if one of the jobs misbehaves or brings down a Task Manager, +then all jobs running on that Task Manager will be affected by the failure. This, apart from a negative +impact on the job that caused the failure, implies a potential massive recovery process with all the +restarting jobs accessing the filesystem concurrently and making it unavailable to other services. +Additionally, having a single cluster running multiple jobs implies more load for the JobManager, who Review comment: Here we write JobManager. Hence, I'd suggest to make it consistent. ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. Review comment: Maybe: Flink's JobManager can be run in high availability mode which allows Flink to recover from JobManager faults. In order to failover faster, multiple standby JobManagers can be started to act as backups. ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> + </tr> + <tr> + <td>Metrics Storage</td> + <td> + Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well. + </td> + <td>See <a href="">Metrics Reporter</a> page.</td> + </tr> + <tr> + <td>Application-level data sources and sinks</td> + <td> + While application-level data sources and sinks are not technically part of the deployment of Flink cluster components, they should be considered when planning a new Flink production deployment. Colocating frequently used data with Flink can have significant performance benefits + </td> + <td> + For example: + <ul> + <li>Apache Kafka</li> + <li>Amazon S3</li> + <li>ElasticSearch</li> + <li>Apache Cassandra</li> + </ul> + See <a href="">Connectors</a> page. + </td> + </tr> + </tbody> +</table> + + + +## Deployment Modes + +Flink can execute applications in one of three ways: + - in Session Mode, + - in a Per-Job Mode, or + - in Application Mode. + + The above modes differ in: + - the cluster lifecycle and resource isolation guarantees + - whether the application's `main()` method is executed on the client or on the cluster. + + + +<!-- Image source: https://docs.google.com/drawings/d/1EfloufuOp1A7YDwZmBEsHKRLIrrbtRkoWRPcfZI5RYQ/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_modes.svg %}" alt="Figure for Deployment Modes" /> + +#### Session Mode + +*Session mode* assumes an already running cluster and uses the resources of that cluster to execute any +submitted application. Applications executed in the same (session) cluster use, and consequently compete +for, the same resources. This has the advantage that you do not pay the resource overhead of spinning up +a full cluster for every submitted job. But, if one of the jobs misbehaves or brings down a Task Manager, +then all jobs running on that Task Manager will be affected by the failure. This, apart from a negative +impact on the job that caused the failure, implies a potential massive recovery process with all the +restarting jobs accessing the filesystem concurrently and making it unavailable to other services. +Additionally, having a single cluster running multiple jobs implies more load for the JobManager, who +is responsible for the book-keeping of all the jobs in the cluster. + +#### Per-Job Mode + +Aiming at providing better resource isolation guarantees, the *Per-Job* mode uses the available resource provider +framework (e.g. YARN, Kubernetes) to spin up a cluster for each submitted job. This cluster is available to +that job only. When the job finishes, the cluster is torn down and any lingering resources (files, etc) are +cleared up. This provides better resource isolation, as a misbehaving job can only bring down its own +Task Managers. In addition, it spreads the load of book-keeping across multiple JobManagers, as there is +one per job. For these reasons, the *Per-Job* resource allocation model is the preferred mode by many +production reasons. + +#### Application Mode + +In all the above modes, the application's `main()` method is executed on the client side. This process +includes downloading the application's dependencies locally, executing the `main()` to extract a representation +of the application that Flink's runtime can understand (i.e. the `JobGraph`) and ship the dependencies and +the `JobGraph(s)` to the cluster. This makes the Client a heavy resource consumer as it may need substantial +network bandwidth to download dependencies and ship binaries to the cluster, and CPU cycles to execute the +`main()`. This problem can be more pronounced when the Client is shared across users. + +Building on this observation, the *Application Mode* creates a cluster per submitted application, but this time, +the `main()` method of the application is executed on the JobManager. Creating a cluster per application can be +seen as creating a session cluster shared only among the jobs of a particular application, and torn down when +the application finishes. With this architecture, the *Application Mode* provides the same resource isolation +and load balancing guarantees as the *Per-Job* mode, but at the granularity of a whole application. Executing +the `main()` on the JobManager allows for saving the CPU cycles required, but also save the bandwidth required +for downloading the dependencies locally. Furthermore, it allows for more even spread of the network load of Review comment: ```suggestion for downloading the dependencies locally. Furthermore, it allows for more even spread of the network load for ``` ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> + </tr> + <tr> + <td>Metrics Storage</td> + <td> + Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well. + </td> + <td>See <a href="">Metrics Reporter</a> page.</td> + </tr> + <tr> + <td>Application-level data sources and sinks</td> + <td> + While application-level data sources and sinks are not technically part of the deployment of Flink cluster components, they should be considered when planning a new Flink production deployment. Colocating frequently used data with Flink can have significant performance benefits + </td> + <td> + For example: + <ul> + <li>Apache Kafka</li> + <li>Amazon S3</li> + <li>ElasticSearch</li> + <li>Apache Cassandra</li> + </ul> + See <a href="">Connectors</a> page. + </td> + </tr> + </tbody> +</table> + + + +## Deployment Modes + +Flink can execute applications in one of three ways: + - in Session Mode, + - in a Per-Job Mode, or + - in Application Mode. + + The above modes differ in: + - the cluster lifecycle and resource isolation guarantees + - whether the application's `main()` method is executed on the client or on the cluster. + + + +<!-- Image source: https://docs.google.com/drawings/d/1EfloufuOp1A7YDwZmBEsHKRLIrrbtRkoWRPcfZI5RYQ/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_modes.svg %}" alt="Figure for Deployment Modes" /> + +#### Session Mode + +*Session mode* assumes an already running cluster and uses the resources of that cluster to execute any +submitted application. Applications executed in the same (session) cluster use, and consequently compete +for, the same resources. This has the advantage that you do not pay the resource overhead of spinning up +a full cluster for every submitted job. But, if one of the jobs misbehaves or brings down a Task Manager, +then all jobs running on that Task Manager will be affected by the failure. This, apart from a negative +impact on the job that caused the failure, implies a potential massive recovery process with all the +restarting jobs accessing the filesystem concurrently and making it unavailable to other services. +Additionally, having a single cluster running multiple jobs implies more load for the JobManager, who +is responsible for the book-keeping of all the jobs in the cluster. + +#### Per-Job Mode + +Aiming at providing better resource isolation guarantees, the *Per-Job* mode uses the available resource provider +framework (e.g. YARN, Kubernetes) to spin up a cluster for each submitted job. This cluster is available to +that job only. When the job finishes, the cluster is torn down and any lingering resources (files, etc) are +cleared up. This provides better resource isolation, as a misbehaving job can only bring down its own +Task Managers. In addition, it spreads the load of book-keeping across multiple JobManagers, as there is +one per job. For these reasons, the *Per-Job* resource allocation model is the preferred mode by many +production reasons. + +#### Application Mode + +In all the above modes, the application's `main()` method is executed on the client side. This process +includes downloading the application's dependencies locally, executing the `main()` to extract a representation +of the application that Flink's runtime can understand (i.e. the `JobGraph`) and ship the dependencies and +the `JobGraph(s)` to the cluster. This makes the Client a heavy resource consumer as it may need substantial +network bandwidth to download dependencies and ship binaries to the cluster, and CPU cycles to execute the +`main()`. This problem can be more pronounced when the Client is shared across users. + +Building on this observation, the *Application Mode* creates a cluster per submitted application, but this time, +the `main()` method of the application is executed on the JobManager. Creating a cluster per application can be +seen as creating a session cluster shared only among the jobs of a particular application, and torn down when +the application finishes. With this architecture, the *Application Mode* provides the same resource isolation +and load balancing guarantees as the *Per-Job* mode, but at the granularity of a whole application. Executing +the `main()` on the JobManager allows for saving the CPU cycles required, but also save the bandwidth required +for downloading the dependencies locally. Furthermore, it allows for more even spread of the network load of +downloading the dependencies of the applications in the cluster, as there is one JobManager per application. + +<div class="alert alert-info" markdown="span"> + <strong>Note:</strong> In the Application Mode, the `main()` is executed on the cluster and not on the client, + as in the other modes. This may have implications for your code as, for example, any paths you register in + your environment using the `registerCachedFile()` must be accessible by the JobManager of your application. +</div> + +Compared to the *Per-Job* mode, the *Application Mode* allows the submission of applications consisting of +multiple jobs. The order of job execution is not affected by the deployment mode but by the call used +to launch the job. Using `execute()`, which is blocking, establishes an order and it will lead to the +execution of the "next" job being postponed until "this" job finishes. Using `executeAsync()`, which is +non-blocking, will lead to the "next" job starting before "this" job finishes. + +<div class="alert alert-info" markdown="span"> + <strong>Attention:</strong> The Application Mode allows for multi-`execute()` applications but + High-Availability is not supported in these cases. High-Availability in Application Mode is only + supported for single-`execute()` applications. Review comment: @kl0u are we failing if we are running a multi-execute job with HA enabled? ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> Review comment: Could we maybe ask marketing to help us with beautifying this picture? ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> + </tr> + <tr> + <td>Metrics Storage</td> + <td> + Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well. + </td> + <td>See <a href="">Metrics Reporter</a> page.</td> + </tr> + <tr> + <td>Application-level data sources and sinks</td> + <td> + While application-level data sources and sinks are not technically part of the deployment of Flink cluster components, they should be considered when planning a new Flink production deployment. Colocating frequently used data with Flink can have significant performance benefits + </td> + <td> + For example: + <ul> + <li>Apache Kafka</li> + <li>Amazon S3</li> + <li>ElasticSearch</li> + <li>Apache Cassandra</li> + </ul> + See <a href="">Connectors</a> page. + </td> + </tr> + </tbody> +</table> + + + +## Deployment Modes + +Flink can execute applications in one of three ways: + - in Session Mode, + - in a Per-Job Mode, or + - in Application Mode. + + The above modes differ in: + - the cluster lifecycle and resource isolation guarantees + - whether the application's `main()` method is executed on the client or on the cluster. + + + +<!-- Image source: https://docs.google.com/drawings/d/1EfloufuOp1A7YDwZmBEsHKRLIrrbtRkoWRPcfZI5RYQ/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_modes.svg %}" alt="Figure for Deployment Modes" /> Review comment: Maybe the marketing team could help us with beautifying this picture. No offense for your artistic skills Robert! ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> + </tr> + <tr> + <td>Metrics Storage</td> + <td> + Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well. + </td> + <td>See <a href="">Metrics Reporter</a> page.</td> + </tr> + <tr> + <td>Application-level data sources and sinks</td> + <td> + While application-level data sources and sinks are not technically part of the deployment of Flink cluster components, they should be considered when planning a new Flink production deployment. Colocating frequently used data with Flink can have significant performance benefits + </td> + <td> + For example: + <ul> + <li>Apache Kafka</li> + <li>Amazon S3</li> + <li>ElasticSearch</li> + <li>Apache Cassandra</li> + </ul> + See <a href="">Connectors</a> page. + </td> + </tr> + </tbody> +</table> + + + +## Deployment Modes + +Flink can execute applications in one of three ways: + - in Session Mode, + - in a Per-Job Mode, or + - in Application Mode. + + The above modes differ in: + - the cluster lifecycle and resource isolation guarantees + - whether the application's `main()` method is executed on the client or on the cluster. + + + +<!-- Image source: https://docs.google.com/drawings/d/1EfloufuOp1A7YDwZmBEsHKRLIrrbtRkoWRPcfZI5RYQ/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_modes.svg %}" alt="Figure for Deployment Modes" /> + +#### Session Mode + +*Session mode* assumes an already running cluster and uses the resources of that cluster to execute any +submitted application. Applications executed in the same (session) cluster use, and consequently compete +for, the same resources. This has the advantage that you do not pay the resource overhead of spinning up +a full cluster for every submitted job. But, if one of the jobs misbehaves or brings down a Task Manager, +then all jobs running on that Task Manager will be affected by the failure. This, apart from a negative +impact on the job that caused the failure, implies a potential massive recovery process with all the +restarting jobs accessing the filesystem concurrently and making it unavailable to other services. +Additionally, having a single cluster running multiple jobs implies more load for the JobManager, who +is responsible for the book-keeping of all the jobs in the cluster. + +#### Per-Job Mode + +Aiming at providing better resource isolation guarantees, the *Per-Job* mode uses the available resource provider +framework (e.g. YARN, Kubernetes) to spin up a cluster for each submitted job. This cluster is available to +that job only. When the job finishes, the cluster is torn down and any lingering resources (files, etc) are +cleared up. This provides better resource isolation, as a misbehaving job can only bring down its own +Task Managers. In addition, it spreads the load of book-keeping across multiple JobManagers, as there is +one per job. For these reasons, the *Per-Job* resource allocation model is the preferred mode by many +production reasons. + +#### Application Mode + +In all the above modes, the application's `main()` method is executed on the client side. This process +includes downloading the application's dependencies locally, executing the `main()` to extract a representation +of the application that Flink's runtime can understand (i.e. the `JobGraph`) and ship the dependencies and +the `JobGraph(s)` to the cluster. This makes the Client a heavy resource consumer as it may need substantial +network bandwidth to download dependencies and ship binaries to the cluster, and CPU cycles to execute the +`main()`. This problem can be more pronounced when the Client is shared across users. + +Building on this observation, the *Application Mode* creates a cluster per submitted application, but this time, +the `main()` method of the application is executed on the JobManager. Creating a cluster per application can be +seen as creating a session cluster shared only among the jobs of a particular application, and torn down when +the application finishes. With this architecture, the *Application Mode* provides the same resource isolation +and load balancing guarantees as the *Per-Job* mode, but at the granularity of a whole application. Executing +the `main()` on the JobManager allows for saving the CPU cycles required, but also save the bandwidth required +for downloading the dependencies locally. Furthermore, it allows for more even spread of the network load of +downloading the dependencies of the applications in the cluster, as there is one JobManager per application. + +<div class="alert alert-info" markdown="span"> + <strong>Note:</strong> In the Application Mode, the `main()` is executed on the cluster and not on the client, + as in the other modes. This may have implications for your code as, for example, any paths you register in + your environment using the `registerCachedFile()` must be accessible by the JobManager of your application. +</div> + +Compared to the *Per-Job* mode, the *Application Mode* allows the submission of applications consisting of +multiple jobs. The order of job execution is not affected by the deployment mode but by the call used +to launch the job. Using `execute()`, which is blocking, establishes an order and it will lead to the +execution of the "next" job being postponed until "this" job finishes. Using `executeAsync()`, which is +non-blocking, will lead to the "next" job starting before "this" job finishes. + +<div class="alert alert-info" markdown="span"> + <strong>Attention:</strong> The Application Mode allows for multi-`execute()` applications but + High-Availability is not supported in these cases. High-Availability in Application Mode is only + supported for single-`execute()` applications. +</div> + +#### Summary + +In *Session Mode*, the cluster lifecycle is independent of that of any job running on the cluster +and the resources are shared across all jobs. The *Per-Job* mode pays the price of spinning up a cluster +for every submitted job, but this comes with better isolation guarantees as the resources are not shared +across jobs. In this case, the lifecycle of the cluster is bound to that of the job. Finally, the +*Application Mode* creates a session cluster per application and executes the application's `main()` +method on the cluster. + + + +## Vendor Solutions + +A number of vendors offer managed or fully hosted Flink solutions. +None of these vendors are officially supported or endorsed by the Apache Flink PMC. +Please refer to vendor maintained documentation on how to use these products. + +<!-- +Please keep this list in alphabetical order +--> + +#### AliCloud Realtime Compute + +[Website](https://www.alibabacloud.com/products/realtime-compute) + +Supported Environments: +<span class="label label-primary">AliCloud</span> + +#### Amazon EMR + +[Website](https://aws.amazon.com/emr/) + +Supported Environments: +<span class="label label-primary">AWS</span> + +#### Amazon Kinesis Data Analytics for Apache Flink + +[Website](https://docs.aws.amazon.com/kinesisanalytics/latest/java/what-is.html) + +Supported Environments: +<span class="label label-primary">AWS</span> + +#### Cloudera DataFlow + +[Website](https://www.cloudera.com/products/cdf.html) + +Supported Environment: +<span class="label label-primary">AWS</span> +<span class="label label-primary">Azure</span> +<span class="label label-primary">Google Cloud</span> +<span class="label label-primary">On-Premise</span> + +#### Eventador + +[Website](https://eventador.io) + +Supported Environment: +<span class="label label-primary">AWS</span> + +#### Huawei Cloud Stream Service + +[Website](https://www.huaweicloud.com/en-us/product/cs.html) + +Supported Environment: +<span class="label label-primary">Huawei Cloud</span> + +#### Ververica Platform + +[Website](https://www.ververica.com/platform-overview) + +Supported Environments: +<span class="label label-primary">AliCloud</span> +<span class="label label-primary">AWS</span> +<span class="label label-primary">Azure</span> +<span class="label label-primary">Google Cloud</span> +<span class="label label-primary">On-Premise</span> + +## Deployment Best Practices + +### How to provide dependencies in the classpath + +Flink provides several approaches for providing dependencies (such as `*.jar` files or static data) to Flink or user-provided +applications. These approaches differ based on the deployment mode and target, but also have commonalities, which are described here. + +To provide a dependency, there are the following options: +- files in the **`lib/` folder** are added to the classpath used to start Flink. It is suitable for libraries such as Hadoop or file systems not available as plugins. Beware that classes added here can potentially interfere with Flink, for example if you are adding a different version of a library already provided by Flink. + +- **`plugins/<name>/`** are loaded at runtime by Flink through separate classloaders to avoid conflicts with classes loaded and used by Flink. Only jar files which are prepared as [plugins]({% link ops/plugins.md %}) can be added here. + +### Download Maven dependencies locally Review comment: Should this be part of the overview page? Maybe it is better suited for tips & tricks. ########## File path: docs/ops/deployment/overview.md ########## @@ -0,0 +1,358 @@ +--- +title: "Clusters & Deployment" +nav-id: deployment +nav-parent_id: ops +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion. + +Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations. +If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link ops/deployment/local.md %}). + +* This will be replaced by the TOC +{:toc} + + +## Overview and Reference Architecture + +The figure below shows the building blocks of every Flink cluster. There is always somewhere a client running. It takes the code of the Flink applications, transforms it into a job graph and submits it to the JobManager. + +The JobManager distributes the work onto the TaskManagers, where the actual operators (such as sources, transformations and sinks) are running. + +When deploying Flink, there are often multiple options available for each building block. We have listed them in the table below the figure. + +If you don't know where to start, we recommend using the Command Line Interface for submitting Flink applications to a Standalone Cluster. + +<!-- Image source: https://docs.google.com/drawings/d/1s_ZlXXvADqxWfTMNRVwQeg7HZ3hN1Xb7goxDPjTEPrI/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_overview.svg %}" alt="Figure for Overview and Reference Architecture" /> + + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Component</th> + <th class="text-left" style="width: 50%">Purpose</th> + <th class="text-left">Implementations</th> + </tr> + </thead> + <tbody> + <tr> + <td>Flink Client</td> + <td> + Flink batch or streaming applications are compiled into a dataflow graph, which is submitted to the JobManager. + </td> + <td> + <ul> + <li><a href="">Command Line Interface</a></li> + <li><a href="">REST Endpoint</a></li> + <li><a href="">SQL Client</a></li> + <li><a href="">Python REPL</a></li> + <li><a href="">Scala REPL</a></li> + </ul> + </td> + </tr> + <tr> + <td>JobManager</td> + <td> + JobManager is the name of the central work coordination component of Flink. It has implementations for different resource providers, which differ on high-availability, resource allocation behavior and supported job submission modes. <br /> + JobManager <a href="">modes for job submissions</a>: + <ul> + <li><b>Session Mode</b>: one JobManager instance manages multiple jobs sharing the same cluster of TaskManagers</li> + <li><b>Application Mode</b>: runs the cluster exclusively for one job. The job main method (or client) gets executed on the JobManager.</li> + <li><b>Per-Job Mode</b>: runs the cluster exclusively for one job. The job main method (or client) runs only prior to the cluster creation.</li> + </ul> + </td> + <td> + <ul> + <li><a href="">Kubernetes</a></li> + <li><a href="">YARN</a></li> + <li><a href="">Mesos</a></li> + <li><a href="">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with Docker, Docker Swarm / Compose, non-native Kubernetes and other models is possible through manual setup in this mode) + </li> + </ul> + </td> + </tr> + <tr> + <td colspan="3" class="text-center"> + <b>External Components</b> (all optional) + </td> + </tr> + <tr> + <td>High Availability Service Provider</td> + <td> + Flink’s JobManager supports a high availability mode, where multiple JobManagers participate in a leader election process, resulting in an active JobManager and potentially multiple standby JobManagers, allowing for fast failover in case the active JobManager is lost. + </td> + <td> + <ul> + <li><a href="">Zookeeper</a></li> + <li><a href="">Kubernetes HA</a></li> + </ul> + </td> + </tr> + <tr> + <td>File Storage and Persistency</td> + <td> + For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems + </td> + <td>See <a href="">FileSystems</a> page.</td> + </tr> + <tr> + <td>Resource Provider</td> + <td> + Flink can be deployed through different Resource Provider Frameworks, such as Kubernetes, YARN or Mesos. + </td> + <td>See "JobManager" implementations above.</td> + </tr> + <tr> + <td>Metrics Storage</td> + <td> + Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well. + </td> + <td>See <a href="">Metrics Reporter</a> page.</td> + </tr> + <tr> + <td>Application-level data sources and sinks</td> + <td> + While application-level data sources and sinks are not technically part of the deployment of Flink cluster components, they should be considered when planning a new Flink production deployment. Colocating frequently used data with Flink can have significant performance benefits + </td> + <td> + For example: + <ul> + <li>Apache Kafka</li> + <li>Amazon S3</li> + <li>ElasticSearch</li> + <li>Apache Cassandra</li> + </ul> + See <a href="">Connectors</a> page. + </td> + </tr> + </tbody> +</table> + + + +## Deployment Modes + +Flink can execute applications in one of three ways: + - in Session Mode, + - in a Per-Job Mode, or + - in Application Mode. + + The above modes differ in: + - the cluster lifecycle and resource isolation guarantees + - whether the application's `main()` method is executed on the client or on the cluster. + + + +<!-- Image source: https://docs.google.com/drawings/d/1EfloufuOp1A7YDwZmBEsHKRLIrrbtRkoWRPcfZI5RYQ/edit?usp=sharing --> +<img width="100%" src="{% link fig/deployment_modes.svg %}" alt="Figure for Deployment Modes" /> + +#### Session Mode + +*Session mode* assumes an already running cluster and uses the resources of that cluster to execute any +submitted application. Applications executed in the same (session) cluster use, and consequently compete +for, the same resources. This has the advantage that you do not pay the resource overhead of spinning up +a full cluster for every submitted job. But, if one of the jobs misbehaves or brings down a Task Manager, +then all jobs running on that Task Manager will be affected by the failure. This, apart from a negative +impact on the job that caused the failure, implies a potential massive recovery process with all the +restarting jobs accessing the filesystem concurrently and making it unavailable to other services. +Additionally, having a single cluster running multiple jobs implies more load for the JobManager, who +is responsible for the book-keeping of all the jobs in the cluster. + +#### Per-Job Mode + +Aiming at providing better resource isolation guarantees, the *Per-Job* mode uses the available resource provider +framework (e.g. YARN, Kubernetes) to spin up a cluster for each submitted job. This cluster is available to +that job only. When the job finishes, the cluster is torn down and any lingering resources (files, etc) are +cleared up. This provides better resource isolation, as a misbehaving job can only bring down its own +Task Managers. In addition, it spreads the load of book-keeping across multiple JobManagers, as there is +one per job. For these reasons, the *Per-Job* resource allocation model is the preferred mode by many +production reasons. + +#### Application Mode + +In all the above modes, the application's `main()` method is executed on the client side. This process +includes downloading the application's dependencies locally, executing the `main()` to extract a representation +of the application that Flink's runtime can understand (i.e. the `JobGraph`) and ship the dependencies and +the `JobGraph(s)` to the cluster. This makes the Client a heavy resource consumer as it may need substantial +network bandwidth to download dependencies and ship binaries to the cluster, and CPU cycles to execute the +`main()`. This problem can be more pronounced when the Client is shared across users. + +Building on this observation, the *Application Mode* creates a cluster per submitted application, but this time, +the `main()` method of the application is executed on the JobManager. Creating a cluster per application can be +seen as creating a session cluster shared only among the jobs of a particular application, and torn down when +the application finishes. With this architecture, the *Application Mode* provides the same resource isolation +and load balancing guarantees as the *Per-Job* mode, but at the granularity of a whole application. Executing +the `main()` on the JobManager allows for saving the CPU cycles required, but also save the bandwidth required +for downloading the dependencies locally. Furthermore, it allows for more even spread of the network load of +downloading the dependencies of the applications in the cluster, as there is one JobManager per application. + +<div class="alert alert-info" markdown="span"> + <strong>Note:</strong> In the Application Mode, the `main()` is executed on the cluster and not on the client, + as in the other modes. This may have implications for your code as, for example, any paths you register in + your environment using the `registerCachedFile()` must be accessible by the JobManager of your application. +</div> + +Compared to the *Per-Job* mode, the *Application Mode* allows the submission of applications consisting of +multiple jobs. The order of job execution is not affected by the deployment mode but by the call used +to launch the job. Using `execute()`, which is blocking, establishes an order and it will lead to the +execution of the "next" job being postponed until "this" job finishes. Using `executeAsync()`, which is +non-blocking, will lead to the "next" job starting before "this" job finishes. + +<div class="alert alert-info" markdown="span"> + <strong>Attention:</strong> The Application Mode allows for multi-`execute()` applications but + High-Availability is not supported in these cases. High-Availability in Application Mode is only + supported for single-`execute()` applications. +</div> + +#### Summary + +In *Session Mode*, the cluster lifecycle is independent of that of any job running on the cluster +and the resources are shared across all jobs. The *Per-Job* mode pays the price of spinning up a cluster +for every submitted job, but this comes with better isolation guarantees as the resources are not shared +across jobs. In this case, the lifecycle of the cluster is bound to that of the job. Finally, the +*Application Mode* creates a session cluster per application and executes the application's `main()` +method on the cluster. + + + +## Vendor Solutions + +A number of vendors offer managed or fully hosted Flink solutions. +None of these vendors are officially supported or endorsed by the Apache Flink PMC. +Please refer to vendor maintained documentation on how to use these products. + +<!-- +Please keep this list in alphabetical order +--> + +#### AliCloud Realtime Compute + +[Website](https://www.alibabacloud.com/products/realtime-compute) + +Supported Environments: +<span class="label label-primary">AliCloud</span> + +#### Amazon EMR + +[Website](https://aws.amazon.com/emr/) + +Supported Environments: +<span class="label label-primary">AWS</span> + +#### Amazon Kinesis Data Analytics for Apache Flink + +[Website](https://docs.aws.amazon.com/kinesisanalytics/latest/java/what-is.html) + +Supported Environments: +<span class="label label-primary">AWS</span> + +#### Cloudera DataFlow + +[Website](https://www.cloudera.com/products/cdf.html) + +Supported Environment: +<span class="label label-primary">AWS</span> +<span class="label label-primary">Azure</span> +<span class="label label-primary">Google Cloud</span> +<span class="label label-primary">On-Premise</span> + +#### Eventador + +[Website](https://eventador.io) + +Supported Environment: +<span class="label label-primary">AWS</span> + +#### Huawei Cloud Stream Service + +[Website](https://www.huaweicloud.com/en-us/product/cs.html) + +Supported Environment: +<span class="label label-primary">Huawei Cloud</span> + +#### Ververica Platform + +[Website](https://www.ververica.com/platform-overview) + +Supported Environments: +<span class="label label-primary">AliCloud</span> +<span class="label label-primary">AWS</span> +<span class="label label-primary">Azure</span> +<span class="label label-primary">Google Cloud</span> +<span class="label label-primary">On-Premise</span> + +## Deployment Best Practices + +### How to provide dependencies in the classpath + +Flink provides several approaches for providing dependencies (such as `*.jar` files or static data) to Flink or user-provided +applications. These approaches differ based on the deployment mode and target, but also have commonalities, which are described here. + +To provide a dependency, there are the following options: +- files in the **`lib/` folder** are added to the classpath used to start Flink. It is suitable for libraries such as Hadoop or file systems not available as plugins. Beware that classes added here can potentially interfere with Flink, for example if you are adding a different version of a library already provided by Flink. + +- **`plugins/<name>/`** are loaded at runtime by Flink through separate classloaders to avoid conflicts with classes loaded and used by Flink. Only jar files which are prepared as [plugins]({% link ops/plugins.md %}) can be added here. Review comment: ```suggestion - **`plugins/<name>/`** are loaded at runtime by Flink through separate classloaders to avoid conflicts with classes loaded and used by Flink. Only jar files which are prepared as [plugins]({% link deployment/filesystems/plugins.md %}) can be added here. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
