[GitHub] [flink] aljoscha commented on a change in pull request #12549: [FLINK-18084][docs] Document the Application Mode

GitBox Tue, 09 Jun 2020 09:33:16 -0700


aljoscha commented on a change in pull request #12549:
URL: https://github.com/apache/flink/pull/12549#discussion_r437371477




##########
File path: docs/ops/deployment/index.md
##########
@@ -104,6 +104,72 @@ Apache Flink ships with first class support for a number 
of common deployment ta
   </div>
 </div>
 
+## Deployment Modes
+
+Flink can execute jobs in one of three ways:
+ - in Session Mode, 
+ - in a Per-Job Mode, or
+ - in Application Mode.
+
+ The above modes differ in:
+ - the cluster lifecycle and resource isolation guarantees
+ - whether the application's `main()` is executed on the client or on the 
cluster.
+
+#### Session Mode
+
+*Session mode* assumes an already running cluster and uses the resources of 
that cluster to execute any 
+submitted application. Applications executed in the same (session) cluster 
use, and consequently compete
+for, the same resources. This has the advantage that you do not pay the 
resource overhead of spinning up
+a full cluster for every submitted job. But, if one of the jobs misbehaves or 
brings down a Task Manager,
+then all jobs running on that Task Manager will be affected by the failure. 
This, apart from a negative
+impact on the job that caused the failure, implies a potential massive 
recovery process with all the 
+restarting jobs accessing the filesystem concurrently and making it 
unavailable to other services. 
+Additionally, having a single cluster running multiple jobs implies more load 
for the JobManager, who 
+is responsible for the book-keeping of all the jobs in the cluster.
+
+#### Per-Job Mode
+
+Aiming at providing better resource isolation guarantees, the *Per-Job* mode 
uses the available cluster manager
+framework (e.g. YARN, Kubernetes) to spin up a cluster for each submitted job. 
This cluster is available to 
+that job only. When the job finishes, the cluster is torn down and any 
lingering resources (files, etc) are
+cleared up. This provides better resource isolation, as a misbehaving job can 
only bring down its own 
+TaskManagers. In addition, it spreads the load of book-keeping across multiple 
Job Managers, as there is 

Review comment:
       ```suggestion
   `TaskManagers`. In addition, it spreads the load of book-keeping across 
multiple Job Managers, as there is 
   ```
   
   also, I think Job Manager is by now called master or Flink master, see 
https://ci.apache.org/projects/flink/flink-docs-master/concepts/glossary.html

##########
File path: docs/ops/deployment/index.md
##########
@@ -104,6 +104,72 @@ Apache Flink ships with first class support for a number 
of common deployment ta
   </div>
 </div>
 
+## Deployment Modes
+
+Flink can execute jobs in one of three ways:
+ - in Session Mode, 
+ - in a Per-Job Mode, or
+ - in Application Mode.
+
+ The above modes differ in:
+ - the cluster lifecycle and resource isolation guarantees
+ - whether the application's `main()` is executed on the client or on the 
cluster.

Review comment:
       ```suggestion
    - whether the application's `main()` method is executed on the client or on 
the cluster.
   ```

##########
File path: docs/ops/deployment/index.md
##########
@@ -104,6 +104,72 @@ Apache Flink ships with first class support for a number 
of common deployment ta
   </div>
 </div>
 
+## Deployment Modes
+
+Flink can execute jobs in one of three ways:
+ - in Session Mode, 
+ - in a Per-Job Mode, or
+ - in Application Mode.
+
+ The above modes differ in:
+ - the cluster lifecycle and resource isolation guarantees
+ - whether the application's `main()` is executed on the client or on the 
cluster.
+
+#### Session Mode
+
+*Session mode* assumes an already running cluster and uses the resources of 
that cluster to execute any 
+submitted application. Applications executed in the same (session) cluster 
use, and consequently compete
+for, the same resources. This has the advantage that you do not pay the 
resource overhead of spinning up
+a full cluster for every submitted job. But, if one of the jobs misbehaves or 
brings down a Task Manager,
+then all jobs running on that Task Manager will be affected by the failure. 
This, apart from a negative
+impact on the job that caused the failure, implies a potential massive 
recovery process with all the 
+restarting jobs accessing the filesystem concurrently and making it 
unavailable to other services. 
+Additionally, having a single cluster running multiple jobs implies more load 
for the JobManager, who 
+is responsible for the book-keeping of all the jobs in the cluster.
+
+#### Per-Job Mode
+
+Aiming at providing better resource isolation guarantees, the *Per-Job* mode 
uses the available cluster manager
+framework (e.g. YARN, Kubernetes) to spin up a cluster for each submitted job. 
This cluster is available to 
+that job only. When the job finishes, the cluster is torn down and any 
lingering resources (files, etc) are
+cleared up. This provides better resource isolation, as a misbehaving job can 
only bring down its own 
+TaskManagers. In addition, it spreads the load of book-keeping across multiple 
Job Managers, as there is 
+one per job. For these reasons, the *Per-Job* resource allocation model is the 
preferred mode by many 
+production reasons.
+
+#### Application Mode
+    
+In all the above modes, the application's `main()` method is executed on the 
client side. This process 

Review comment:
       ```suggestion
   In all the above modes, the applications `main()` method is executed on the 
client side. This process 
   ```

##########
File path: docs/ops/deployment/yarn_setup.md
##########
@@ -251,6 +250,29 @@ The user-jars position in the class path can be controlled 
by setting the parame
 - `FIRST`: Adds the jar to the beginning of the system class path.
 - `LAST`: Adds the jar to the end of the system class path.
 
+## Run an application in Application Mode
+
+To launch an application in [Application Mode]({{ site.baseurl 
}}/ops/deployment/#deployment-modes), you can type:
+
+{% highlight bash %}
+./bin/flink run-application -t yarn-application ./examples/batch/WordCount.jar
+{% endhighlight %}
+
+The command above, goes against the recently introduced "Generic CLI". So, 
apart from the `-t`, all 

Review comment:
       I think we shouldn't mention `Generic CLI` here and that it was recently 
introduced. This will not age well. 😅
   
   But it's good to describe that basically the only custom parameter is `-t`, 
everything else is as in the config.

##########
File path: docs/ops/deployment/index.md
##########
@@ -104,6 +104,72 @@ Apache Flink ships with first class support for a number 
of common deployment ta
   </div>
 </div>
 
+## Deployment Modes
+
+Flink can execute jobs in one of three ways:
+ - in Session Mode, 
+ - in a Per-Job Mode, or
+ - in Application Mode.
+
+ The above modes differ in:
+ - the cluster lifecycle and resource isolation guarantees
+ - whether the application's `main()` is executed on the client or on the 
cluster.
+
+#### Session Mode
+
+*Session mode* assumes an already running cluster and uses the resources of 
that cluster to execute any 
+submitted application. Applications executed in the same (session) cluster 
use, and consequently compete
+for, the same resources. This has the advantage that you do not pay the 
resource overhead of spinning up
+a full cluster for every submitted job. But, if one of the jobs misbehaves or 
brings down a Task Manager,
+then all jobs running on that Task Manager will be affected by the failure. 
This, apart from a negative
+impact on the job that caused the failure, implies a potential massive 
recovery process with all the 
+restarting jobs accessing the filesystem concurrently and making it 
unavailable to other services. 
+Additionally, having a single cluster running multiple jobs implies more load 
for the JobManager, who 
+is responsible for the book-keeping of all the jobs in the cluster.
+
+#### Per-Job Mode
+
+Aiming at providing better resource isolation guarantees, the *Per-Job* mode 
uses the available cluster manager
+framework (e.g. YARN, Kubernetes) to spin up a cluster for each submitted job. 
This cluster is available to 
+that job only. When the job finishes, the cluster is torn down and any 
lingering resources (files, etc) are
+cleared up. This provides better resource isolation, as a misbehaving job can 
only bring down its own 
+TaskManagers. In addition, it spreads the load of book-keeping across multiple 
Job Managers, as there is 
+one per job. For these reasons, the *Per-Job* resource allocation model is the 
preferred mode by many 
+production reasons.
+
+#### Application Mode
+    
+In all the above modes, the application's `main()` method is executed on the 
client side. This process 
+includes downloading the application's dependencies locally, executing the 
`main()` to extract a representation
+of the application that Flink's runtime can understand (i.e. the `JobGraph`) 
and ship the dependencies and
+the `JobGraph(s)` to the cluster. This makes the Client a heavy resource 
consumer as it may need substantial
+network bandwidth to download dependencies and ship binaries to the cluster, 
and CPU cycles to execute the
+`main()`. This problem can be more pronounced when the Client is shared across 
users.
+
+Building on this observation, the *Application Mode* creates a cluster per 
submitted application, but this time,

Review comment:
       ```suggestion
   Building on this observation, the *Application Mode* creates a cluster per 
submitted application, but contrary to per-job mode,
   ```

##########
File path: docs/ops/deployment/index.md
##########
@@ -104,6 +104,72 @@ Apache Flink ships with first class support for a number 
of common deployment ta
   </div>
 </div>
 
+## Deployment Modes
+
+Flink can execute jobs in one of three ways:
+ - in Session Mode, 
+ - in a Per-Job Mode, or
+ - in Application Mode.
+
+ The above modes differ in:
+ - the cluster lifecycle and resource isolation guarantees
+ - whether the application's `main()` is executed on the client or on the 
cluster.
+
+#### Session Mode
+
+*Session mode* assumes an already running cluster and uses the resources of 
that cluster to execute any 
+submitted application. Applications executed in the same (session) cluster 
use, and consequently compete
+for, the same resources. This has the advantage that you do not pay the 
resource overhead of spinning up
+a full cluster for every submitted job. But, if one of the jobs misbehaves or 
brings down a Task Manager,
+then all jobs running on that Task Manager will be affected by the failure. 
This, apart from a negative
+impact on the job that caused the failure, implies a potential massive 
recovery process with all the 
+restarting jobs accessing the filesystem concurrently and making it 
unavailable to other services. 
+Additionally, having a single cluster running multiple jobs implies more load 
for the JobManager, who 
+is responsible for the book-keeping of all the jobs in the cluster.
+
+#### Per-Job Mode
+
+Aiming at providing better resource isolation guarantees, the *Per-Job* mode 
uses the available cluster manager
+framework (e.g. YARN, Kubernetes) to spin up a cluster for each submitted job. 
This cluster is available to 
+that job only. When the job finishes, the cluster is torn down and any 
lingering resources (files, etc) are
+cleared up. This provides better resource isolation, as a misbehaving job can 
only bring down its own 
+TaskManagers. In addition, it spreads the load of book-keeping across multiple 
Job Managers, as there is 
+one per job. For these reasons, the *Per-Job* resource allocation model is the 
preferred mode by many 
+production reasons.
+
+#### Application Mode
+    
+In all the above modes, the application's `main()` method is executed on the 
client side. This process 
+includes downloading the application's dependencies locally, executing the 
`main()` to extract a representation
+of the application that Flink's runtime can understand (i.e. the `JobGraph`) 
and ship the dependencies and
+the `JobGraph(s)` to the cluster. This makes the Client a heavy resource 
consumer as it may need substantial
+network bandwidth to download dependencies and ship binaries to the cluster, 
and CPU cycles to execute the
+`main()`. This problem can be more pronounced when the Client is shared across 
users.
+
+Building on this observation, the *Application Mode* creates a cluster per 
submitted application, but this time,
+the `main()` method of the application is executed on the JobManager. Creating 
a cluster per application can be 
+seen as creating a session cluster shared only among the jobs of a particular 
application, and torn down when
+the application finishes. With this architecture, the *Application Mode* 
provides the same resource isolation
+and load balancing guarantees as the *Per-Job* mode, but at the granularity of 
a whole application. Executing 
+the `main()` on the JobManager allows for saving the CPU cycles required, but 
also save the bandwidth required

Review comment:
       ```suggestion
   the `main()` method on the JobManager allows for saving the CPU cycles 
required, but also save the bandwidth required
   ```

##########
File path: docs/ops/deployment/index.md
##########
@@ -104,6 +104,72 @@ Apache Flink ships with first class support for a number 
of common deployment ta
   </div>
 </div>
 
+## Deployment Modes
+
+Flink can execute jobs in one of three ways:
+ - in Session Mode, 
+ - in a Per-Job Mode, or
+ - in Application Mode.
+
+ The above modes differ in:
+ - the cluster lifecycle and resource isolation guarantees
+ - whether the application's `main()` is executed on the client or on the 
cluster.
+
+#### Session Mode
+
+*Session mode* assumes an already running cluster and uses the resources of 
that cluster to execute any 
+submitted application. Applications executed in the same (session) cluster 
use, and consequently compete
+for, the same resources. This has the advantage that you do not pay the 
resource overhead of spinning up
+a full cluster for every submitted job. But, if one of the jobs misbehaves or 
brings down a Task Manager,
+then all jobs running on that Task Manager will be affected by the failure. 
This, apart from a negative
+impact on the job that caused the failure, implies a potential massive 
recovery process with all the 
+restarting jobs accessing the filesystem concurrently and making it 
unavailable to other services. 
+Additionally, having a single cluster running multiple jobs implies more load 
for the JobManager, who 
+is responsible for the book-keeping of all the jobs in the cluster.
+
+#### Per-Job Mode
+
+Aiming at providing better resource isolation guarantees, the *Per-Job* mode 
uses the available cluster manager
+framework (e.g. YARN, Kubernetes) to spin up a cluster for each submitted job. 
This cluster is available to 
+that job only. When the job finishes, the cluster is torn down and any 
lingering resources (files, etc) are
+cleared up. This provides better resource isolation, as a misbehaving job can 
only bring down its own 
+TaskManagers. In addition, it spreads the load of book-keeping across multiple 
Job Managers, as there is 
+one per job. For these reasons, the *Per-Job* resource allocation model is the 
preferred mode by many 

Review comment:
       ```suggestion
   one per job. For these reasons, the *Per-Job* resource allocation model is 
the preferred mode by many 
   ```
   
   for many production reasons?

##########
File path: docs/ops/deployment/yarn_setup.md
##########
@@ -251,6 +250,29 @@ The user-jars position in the class path can be controlled 
by setting the parame
 - `FIRST`: Adds the jar to the beginning of the system class path.
 - `LAST`: Adds the jar to the end of the system class path.
 
+## Run an application in Application Mode
+
+To launch an application in [Application Mode]({{ site.baseurl 
}}/ops/deployment/#deployment-modes), you can type:

Review comment:
       links should use `{% link %}` syntax, so `{% link 
ops/deployment/index.md %}#deployment-modes`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] aljoscha commented on a change in pull request #12549: [FLINK-18084][docs] Document the Application Mode

Reply via email to