rmetzger commented on a change in pull request #14346:
URL: https://github.com/apache/flink/pull/14346#discussion_r542142908
##########
File path: docs/deployment/resource-providers/standalone/index.md
##########
@@ -24,153 +24,203 @@ specific language governing permissions and limitations
under the License.
-->
-This page provides instructions on how to run Flink in a *fully distributed
fashion* on a *static* (but possibly heterogeneous) cluster.
-
* This will be replaced by the TOC
{:toc}
-## Requirements
-### Software Requirements
+## Getting Started
-Flink runs on all *UNIX-like environments*, e.g. **Linux**, **Mac OS X**, and
**Cygwin** (for Windows) and expects the cluster to consist of **one master
node** and **one or more worker nodes**. Before you start to setup the system,
make sure you have the following software installed **on each node**:
+This *Getting Started* section guides you through the local setup (on one
machine, but in separate processes) of a Flink cluster. This can easily be
expanded to set up a distibuted standalone cluster, which we describe in the
[reference section](#distributed-cluster-setup).
-- **Java 1.8.x** or higher,
-- **ssh** (sshd must be running to use the Flink scripts that manage
- remote components)
+### Introduction
-If your cluster does not fulfill these software requirements you will need to
install/upgrade it.
+The standalone mode is the most barebone way of deploying Flink: The Flink
services described in the [deployment overview]({% link deployment/index.md %})
are just launched as processes on the operating system. Unlike deploying Flink
with a resource provider such as Kubernetes or YARN, you have to take care of
restarting failed processes, or allocation and de-allocation of resources
during operation.
-Having __passwordless SSH__ and
-__the same directory structure__ on all your cluster nodes will allow you to
use our scripts to control
-everything.
+In the additional subpages of the standalone mode resource provider, we
describe additional deployment methods which are based on the standalone mode:
[Deployment in Docker containers]({% link
deployment/resource-providers/standalone/docker.md %}), and on [Kubernetes]({%
link deployment/resource-providers/standalone/kubernetes.md %}).
-{% top %}
+### Preparation
-### `JAVA_HOME` Configuration
+Flink runs on all *UNIX-like environments*, e.g. **Linux**, **Mac OS X**, and
**Cygwin** (for Windows). Before you start to setup the system, make sure you
have the fulfilled the following requirements.
-Flink requires the `JAVA_HOME` environment variable to be set on the master
and all worker nodes and point to the directory of your Java installation.
+- **Java 1.8.x** or higher installed,
+- Downloaded a recent Flink distribution from the [download page]({{
site.download_url }}) and unpacked it.
-You can set this variable in `conf/flink-conf.yaml` via the `env.java.home`
key.
+### Starting a Standalone Cluster (Session Mode)
-{% top %}
+These steps show how to launch a Flink standalone cluster, and submit an
example job:
+
+{% highlight bash %}
+# we assume to be in the root directory of the unzipped Flink distribution
+
+# (1) Start Cluster
+./bin/start-cluster.sh
+
+# (2) You can now access the Flink Web Interface on http://localhost:8081
+
+# (3) Submit example job
+./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
+
+# (4) Stop the cluster again
+./bin/stop-cluster.sh
+{% endhighlight %}
+
+In step `(1)`, we've started 2 processes: A JVM for the JobManager, and a JVM
for the TaskManager. The JobManager is serving the web interface accessible at
[localhost:8081](http://localhost:8081).
+In step `(3)`, we are starting a Flink Client (a short-lived JVM process) that
submits an application to the JobManager.
+
+## Deployment Modes Supported by the Standalone Cluster
+
+### Application Mode
-## Flink Setup
+To start a Flink JobManager with an embedded application, we use the
`bin/standalone-job.sh` script.
+We demonstrate this mode by locally starting the `TopSpeedWindowing.jar`
example, running on a TaskManager.
-Go to the [downloads page]({{ site.download_url }}) and get the ready-to-run
package.
-After downloading the latest release, copy the archive to your master node and
extract it:
+The application jar file needs to be available in the classpath. The easiest
approach to achieve that is putting the jar into the `lib/` folder:
{% highlight bash %}
-tar xzf flink-*.tgz
-cd flink-*
+cp ./examples/streaming/TopSpeedWindowing.jar lib/
{% endhighlight %}
-### Configuring Flink
+Then, we can launch the JobManager:
-After having extracted the system files, you need to configure Flink for the
cluster by editing *conf/flink-conf.yaml*.
+{% highlight bash %}
+./bin/standalone-job.sh start --job-classname
org.apache.flink.streaming.examples.windowing.TopSpeedWindowing
+{% endhighlight %}
-Set the `jobmanager.rpc.address` key to point to your master node. You should
also define the maximum amount of main memory Flink is allowed to allocate on
each node by setting the `jobmanager.memory.process.size` and
`taskmanager.memory.process.size` keys.
+The web interface is now available at [localhost:8081](http://localhost:8081).
However, the application won't be able to start, because there are no
TaskManagers running yet:
-These values are given in MB. If some worker nodes have more main memory which
you want to allocate to the Flink system you can overwrite the default value by
setting `taskmanager.memory.process.size` or `taskmanager.memory.flink.size` in
*conf/flink-conf.yaml* on those specific nodes.
+{% highlight bash %}
+./bin/taskmanager.sh start
+{% endhighlight %}
-Finally, you must provide a list of all nodes in your cluster that shall be
used as worker nodes, i.e., nodes running a TaskManager. Edit the file
*conf/workers* and enter the IP/host name of each worker node.
+Note: You can start multiple TaskManagers, if your application needs more
resources.
-The following example illustrates the setup with three nodes (with IP
addresses from _10.0.0.1_
-to _10.0.0.3_ and hostnames _master_, _worker1_, _worker2_) and shows the
contents of the
-configuration files (which need to be accessible at the same path on all
machines):
+Stopping the services is also supported via the scripts:
-<div class="row">
- <div class="col-md-6 text-center">
- <img src="{% link /page/img/quickstart_cluster.png %}" style="width: 60%">
- </div>
-<div class="col-md-6">
- <div class="row">
- <p class="lead text-center">
- /path/to/<strong>flink/conf/<br>flink-conf.yaml</strong>
- <pre>jobmanager.rpc.address: 10.0.0.1</pre>
- </p>
- </div>
-<div class="row" style="margin-top: 1em;">
- <p class="lead text-center">
- /path/to/<strong>flink/<br>conf/workers</strong>
- <pre>
-10.0.0.2
-10.0.0.3</pre>
- </p>
-</div>
-</div>
-</div>
+{% highlight bash %}
+./bin/taskmanager.sh stop
+./bin/standalone-job.sh stop
+{% endhighlight %}
-The Flink directory must be available on every worker under the same path. You
can use a shared NFS directory, or copy the entire Flink directory to every
worker node.
-Please see the [configuration page]({% link deployment/config.md %}) for
details and additional configuration options.
+### Session Mode
-In particular,
+Local deployment in the session mode has already been described in the
[introduction](#starting-a-standalone-cluster-session-mode) above.
- * the amount of available memory per JobManager
(`jobmanager.memory.process.size`),
- * the amount of available memory per TaskManager
(`taskmanager.memory.process.size` and check [memory setup guide]({% link
deployment/memory/mem_tuning.md %}#configure-memory-for-standalone-deployment)),
- * the number of available CPUs per machine (`taskmanager.numberOfTaskSlots`),
- * the total number of CPUs in the cluster (`parallelism.default`) and
- * the temporary directories (`io.tmp.dirs`)
+## Standalone Cluster Reference
-are very important configuration values.
+### Configuration
-{% top %}
+All available configuration options are listed on the [configuration page]({%
link deployment/config.md %}), in particular the [Basic Setup]({% link
deployment/config.md %}#basic-setup) section contains good advise on
configuring the ports, memory, parallelism etc.
+
+### Debugging
+
+If Flink is behaving unexpectedly, we recommend looking at Flink's log files
as a starting point for further investigations.
+
+The log files are located in the `logs/` directory. There's a `.log` file for
each Flink service running on this machine.
+
+Alternatively, logs are available from the Flink web frontend (both for the
JobManager and each TaskManager).
+
+By default, Flink is logging on the "INFO" log level, which provides basic
information for all obvious issues. For cases where Flink supposedly behaving
wrongly, reducing the log level to "DEBUG" is advised. The logging level is
controlled via the `conf/log4.properties` file.
+Setting `rootLogger.level = DEBUG` will boostrap Flink on the DEBUG log level.
Note that a restart of Flink is required for the changes to take effect.
-### Starting Flink
+There's a dedicated page on the [logging]({%link
deployment/advanced/logging.md %}) in Flink.
-The following script starts a JobManager on the local node and connects via
SSH to all worker nodes listed in the *workers* file to start the TaskManager
on each node. Now your Flink system is up and running. The JobManager running
on the local node will now accept jobs at the configured RPC port.
+### The start and stop scripts
-Assuming that you are on the master node and inside the Flink directory:
+#### start-cluster.sh
+The scripts provided with the standalone mode (in the `bin/` directory) use
the `conf/workers` and `conf/masters` files, to determine the number of cluster
instances to start and stop with the `bin/start-cluster.sh` and
`bin/stop-cluster.sh` scripts.
+
+If password-less ssh access to the listed machines is configured, and they
share the same directory structure, the script also supports starting and
stopping instances remotely.
+
+**Example 1: Start a cluster with 2 TaskManagers locally**
+
+`conf/masters` contents:
{% highlight bash %}
-bin/start-cluster.sh
+localhost
{% endhighlight %}
-To stop Flink, there is also a `stop-cluster.sh` script.
+`conf/workers` contents:
+{% highlight bash %}
+localhost
+localhost
+{% endhighlight %}
-{% top %}
+**Example 2: Start a distributed cluster JobMangers**
+
+This assumes a cluster with 4 machines (`master1, worker1, worker2, worker3`),
which all can reach each other over the network.
+
+`conf/masters` contents:
+{% highlight bash %}
+master1
+{% endhighlight %}
+
+`conf/workers` contents:
+{% highlight bash %}
+worker1
+worker2
+worker3
+{% endhighlight %}
-### Adding JobManager/TaskManager Instances to a Cluster
+Note that the configuration key `jobmanager.rpc.address` needs to be set to
`master1` for this to work.
-You can add both JobManager and TaskManager instances to your running cluster
with the `bin/jobmanager.sh` and `bin/taskmanager.sh` scripts.
+We show a third example with a standby JobManager in the [high-availability
section](#setting-up-high-availability).
-#### Adding a JobManager
+#### (jobmanager|taskmanager).sh
+
+The `bin/jobmanager.sh` and `bin/taskmanager.sh` script support starting the
respective daemon in the background (using the `start` argument), or in the
foreground (using `start-foreground`). In the foreground mode, the logs are
printed to standard out. This mode is useful for deployment scenarios where
another process is controlling the Flink daemon (e.g. Docker).
+
+The scripts can be called multiple times, for example if multiple TaskManagers
are needed. The instances are tracked by the scripts, and can be stopped
one-by-one (using `stop`) or all together (using `stop-all`).
+
+#### Windows Cygwin Users
+
+If you are installing Flink from the git repository and you are using the
Windows git shell, Cygwin can produce a failure similar to this one:
+
+{% highlight bash %}
+c:/flink/bin/start-cluster.sh: line 30: $'\r': command not found
+{% endhighlight %}
+
+This error occurs because git is automatically transforming UNIX line endings
to Windows style line endings when running in Windows. The problem is that
Cygwin can only deal with UNIX style line endings. The solution is to adjust
the Cygwin settings to deal with the correct line endings by following these
three steps:
Review comment:
I think you are right. Stuff runs on windows 😕
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]